Sorting the signal from the noise – Q&A with Dave Lawie
Dave what is geoscientific data and why is it important?
Simply put geoscientific data are any facts, statistics or information obtained from measurement of the earth to determine, for example, rock type, degree of alternation, azimuth or concentration. This data is vital for accurate and cost effective decision making from exploration through to production.
How do we get geoscientific data?
Today we are fortunate to have an increasing range of sensing technologies to help us obtain this valuable data. Importantly for the industry, these new sensing technologies are being pushed closer to the point of sample generation.
How can geoscientific data vary?
The data varies in size and content. It also may be incomplete, vary in measurement precision with time and location, contain artefacts (closure) and be collected at vastly different spatial scales – from continental surveys to discrete metre long sample increments, to continuous downhole logs.
How is the analysis of data changing for Geoscientists?
Analytics has been defined as ‘the discovery and communication of meaningful patterns in data’ and ‘often favours data visualisation to communicate insight’. Geoscientists have been doing this for decades – but what has changed is access to computing power, storage and ease of access to algorithms.
Are there any challenges with these tools?
These tools certainly make analysis and interpretation of data easier, however, it is important to be mindful of the nature of the input data and the original purpose for the collecting the data in order to clearly define what is the signal and what is the noise.
What do you mean by ‘noise’?
Noise is anything present in the data that obscures the signal, however, the definition of signal and noise may change with the use the data is put to, which itself may vary with time. For example, a multi-element assay used for exploration targeting purposes years later could be used to determine the mineralogical deportment of arsenic in a production environment.
What other issues and opportunities do you face with geoscientific data analysis?
Analytics in geoscience needs to adapt to some peculiar requirements. Our data is typically not large, but is often incomplete. Analytics in exploration models out background to find the outliers; the ‘sensors’ need to be moved to the (remote) point of data collection and the analytics done at the ‘edge’ environment, not months later. Conversely, production seeks to eliminate outliers to better model background and also needs to link spatial and temporal data.
The historical emphasis of analytics in Geoscience has been on exploratory data analysis. To move to machine learning requires high quality nominal data for training, which is typically unavailable or determined by imperfect ‘human’ sensors.
These are issues, but there are also immense opportunities for the industry should the Data Scientists of the future be carefully guided by the Geoscientists once the novelty of the approach has worn off.
Modern geoscientific data is sourced from a broad range of sensing technologies, which generate data of varying size and intrinsic content. How is the analysis of data changing for Geoscientists and what are the issues and opportunities?