Timezone: »

 
Workshop
Machine Learning for Geophysical & Geochemical Signals
Laura Pyrak-Nolte · James Rustad · Richard Baraniuk

Fri Dec 07 05:00 AM -- 03:30 PM (PST) @ Room 515
Event URL: http://www.physics.purdue.edu/MLGGS »

Motivation
The interpretation of Earth's subsurface evolution from full waveform analysis requires a method to identify the key signal components related to the evolution in physical properties from changes in stress, fluids, geochemical interactions and other natural and anthropogenic processes. The analysis of seismic waves and other geophysical/geochemical signals remains for the most part a tedious task that geoscientists may perform by visual inspection of the available seismograms. The complexity and noisy nature of a broad array of geoscience signals combined with sparse and irregular sampling make this analysis difficult and imprecise. In addition, many signal components are ignored in tomographic imaging and continuous signal analysis that may prevent discovery of previously unrevealed signals that may point to new physics.

Ideally a detailed interpretation of the geometric contents of these data sets would provide valuable prior information for the solution of corresponding inverse problems. This unsatisfactory state of affairs is indicative of a lack of effective and robust algorithms for the computational parsing and interpretation of seismograms (and other geoscience data sets). Indeed, the limited frequency content, strong nonlinearity, temporally scattered nature of these signals make their analysis with standard signal processing techniques difficult and insufficient.

Once important seismic phases are identified, the next challenge is determining the link between a remotely-measured geophysical response and a characteristic property (or properties) of the fractures and fracture system. While a strong laboratory-based foundation has established a link between the mechanical properties of simple fracture systems (i.e. single fractures, parallel sets of fractures) and elastic wave scattering, bridging to the field scale faces additional complexity and a range of length scales that cannot be achieved from laboratory insight alone. This fundamental knowledge gap at the critical scale for long-term monitoring and risk assessment can only be narrowed or closed with the development of appropriate mathematical and numerical representations at each scale and across scales using multiphysics models that traverse spatial and temporal scales.

Topic
Major breakthroughs in bridging the knowledge gaps in geophysical sensing are anticipated as more researchers turn to machine learning (ML) techniques; however, owing to the inherent complexity of machine learning methods, they are prone to misapplication, may produce uninterpretable models, and are often insufficiently documented. This combination of attributes hinders both reliable assessment of model validity and consistent interpretation of model outputs. By providing documented datasets and challenging teams to apply fully documented workflows for ML approaches, we expect to accelerate progress in the application of data science to longstanding research issues in geophysics.

The goals of this workshop are to:
(1) bring together experts from different fields of ML and geophysics to explore the use of ML techniques related to the identification of the physics contained in geophysical and chemical signals, as well as from images of geologic materials (minerals, fracture patterns, etc.); and
(2) announce a set of geophysics machine learning challenges to the community that address earthquake detection and the physics of rupture and the timing of earthquakes.

Target Audience
We aim to elicit new connections among these diverse fields, identify novel tools and models that can be transferred from one to the other, and explore novel ML applications that will benefit from ML algorithms paradigm. We believe that a successful workshop will lead to new research directions in a variety of areas and will also inspire the development of novel theories and tools.

Fri 5:30 a.m. - 5:40 a.m. [iCal]

Introductory comments by organizers

Laura Pyrak-Nolte, Jim Rustad, Richard Baraniuk
Fri 5:40 a.m. - 6:05 a.m. [iCal]

Probing Earthquake Fault Slip using Machine Learning

Earthquakes take place when two juxtaposed fault blocks are stressed sufficiently to overcome the frictional force holding them in place and they abruptly slip relative to each other. Earthquake faults exhibit a continuum of behaviors ranging from stick slip associated with strong shaking, to slow slip which is primarily aseismic, to very slow slip that is both aseismic and can take place over hours to months. We are characterizing faulting physics by analyzing with machine learning continuous acoustic data streams in the laboratory and continuous seismic data streams in Earth. We use as labels characteristics of the measured fault slip behavior in the laboratory such as the fault friction, shear displacement and fault thickness. In Earth, we use surface displacement as determined by Global Positioning Systems (GPS). Other data data such as INSAR can be used as well. We find that the laboratory acoustic data and the Earth seismic data are a type of Rosetta Stone revealing fault characteristics at all times and fault displacements. This is a surprising observation because previously we believed most or much of the signal was noise. Here we describe an overview of recent work in this area and also describe recent efforts on parallel problems such as volcanoes and geysers.

Paul A Johnson
Fri 6:05 a.m. - 6:30 a.m. [iCal]

Deep Learning of Earthquake Signals

Gregory C. Beroza, S. Mostafa Mousavi, and Weiqiang Zhu

Diverse algorithms have been developed for efficient earthquake signal detection and processing. These algorithms are becoming increasingly important as seismologists strive to extract as much insight as possible from exponentially increasing volumes of continuous seismic data. Waveform similarity search, based on the premise that adjacent earthquakes generate similar waveforms, is now widely and effectively used to detect earthquakes too small to appear routinely in earthquake catalogs. Machine learning has the potential to generalize this similarity search from strict waveform similarity to waveforms that have similar characteristics. Convolutional and recurrent networks have each been shown to be promising tools for earthquake signal detection, and we have developed a deep convolutional-recurrent network to combine the advantages of each. This architecture is well-suited to learn both the spectral and temporal characteristics of earthquake signals. We have applied it to different, but inter-related tasks in earthquake analysis, including: earthquake detection, classification of continuous seismic data into P-waves, S-waves, and noise, and the problem of de-noising of earthquake signals. In our presentation we demonstrate the performance of deep learning applied to seismic signals for each of these tasks.

Greg C Beroza
Fri 6:30 a.m. - 6:55 a.m. [iCal]
Maarten de Hoop (Unsupervised Learning for Identification of Seismic Signals)
Maarten V. de Hoop
Fri 6:55 a.m. - 7:20 a.m. [iCal]

Towards data-driven earthquake detection: Extracting weak seismic signals with locality-sensitive hashing

Extracting weak earthquake signals from continuous waveform data recorded by sensors in a seismic network is a fundamental and challenging task in seismology. In this talk, I will present Fingerprint and Similarity Thresholding (FAST; Yoon et al, 2015), a computationally efficient method for large-scale earthquake detection. FAST adapts technology used for rapid audio identification to the problem of extracting weak earthquake signals in continuous seismic data. FAST uses locality-sensitive hashing, a data mining technique for efficiently identifying similar items in large data sets, to detect similar waveforms (candidate earthquakes) in continuous seismic data. A distinguishing feature of our approach is that FAST is an unsupervised detector; FAST can discover new sources without any template waveforms or waveform characteristics available as training data – a common situation for seismic data sets. In our recent work, we have extended FAST to enable earthquake detection using data from multiple sensors spaced tens or hundreds of kilometers apart (Bergen and Beroza, 2018), and optimized the FAST software for detection at scale (Rong et al., 2018). FAST can now detect earthquakes with previously unknown sources in 10-year, multi-sensor seismic data sets without training data – a capability that was not previously available for seismic data analysis.

Karianne Bergen
Fri 7:20 a.m. - 7:40 a.m. [iCal]

Poster Spotlight

*Mauricio Araya-Polo, Stuart Farris and Manuel Florez, Combining Unsupervised and Supervised Deep Learning approaches for Seismic Tomography Signals from inner earth, seismic waveforms, are heavily manipulated before human interpreters have a chance of figuring the subsurface structures. That manipulation adds modeling biases and it is limited by methodological shortcomings. Alternatively, using waveforms directly is becoming possible thanks to current Deep Learning (DL) advances such as (Araya-Polo et al., 2017 and 2018; Lin et al., 2017). Further extending that work, we present a DL approach that takes realistic raw seismic waveforms as inputs and produces subsurface velocity models as output. When insufficient data is used for training, DL algorithms tend to either over-fit or fail completely. Gathering large amounts of labeled and standardized seismic data sets is not straight forward. We address this shortage of quality data by building a Generative Adversarial Network (GAN) to augment our original training data set, which then is used by the DL seismic tomography as input.

*Yuxing Ben, Chris James, Dingzhou Can, Drilling State Classification with Machine Learning The sensors on drilling rigs and production sites are leading oil and gas companies to mine so-called big data. Leveraging historical time series data and real-time drilling data can help drilling engineers improve rig and well delivery efficiencies; however, it can also help geoscientists understand the geophysical properties of the reservoir. In this case study, we describe how to use machine learning to classify drilling states. We investigated several machine learning methods and architectures including Random Forest tree models, Convolutional Neural Networks, and Recurrent Neural Networks which were then tested against 15 million rows of real, labeled drilling time-series data. We found that machine learning models were superior to rule based models. For wells drilled in two different onshore basins, the accuracies of our in-house rule based models were 70% and 90% respectively, while the accuracies of machine learning models were over 99%. The best identified machine learning model has been deployed in a drilling analytics platform and used to automatically detect the drilling state in realtime for use by Drilling Engineers to evaluate and analyze well performance.

*Jorge Guevara, Blanca Zadrozny, Alvaro Buoro, Ligang Lu, John Tolle, Jan Limbeck, Mingqi Wu, Defletf Hohl, An Interpretable Machine Learning Methodology for Well Data Integration and Sweet Spotting Identification. The huge amount of heterogeneous data provided by the petroleum industry brings opportunities and challenges for applying machine learning methodologies. For instance, petrophysical data recorded in well logs, completions datasets and well production data also constitute good examples of data for training machine learning models with the aim of automating procedures and giving data-driven solutions to problems arisen in the petroleum industry. In this work, we present a machine learning methodology for oil exploration that 1) opens the possibility of integration of heterogeneous data such as completion, engineering, and well production data, as well as, petrophysical feature estimation from petrophysical data from horizontal and vertical wells; 2) it enables the discovery of new locations with high potential for production by using predictive modeling for sweet spotting identification; 3) it facilitates the analysis of the effect, role, and impact of some engineering decisions on production by means of interpretable Machine learning modeling, allowing the model validation; 4) it allows the incorporation of prior/expert knowledge by using Shape Constraint Additive Models and; 5) it enables the construction of hypothetical "what-if" scenarios for production prediction. Among the results, it is important to highlight that 1) performance improves by including prior knowledge via SCAMs, for example, we have a percentage change of 24% between the best RMSE result from black-box ML models vs a model that incorporates prior knowledge. 2) we were able to construct hypothetical what-if scenarios based on actual petrophysical data and hypothetical completion and engineering values, 3) we were able to assess the validity of ML models through effect analysis via conditonal plots.

*Ping Lu, Hunter Danque, Jianxiong Chen, Seth Brazell, and Mostafa Karimi, Enhanced Seismic Imaging with Predictive Neural Networks for Geophysics Full-waveform inversion (FWI) has become a popular method to estimate elastic earth properties from seismic data, and it has great utility in seismic velocity model building and seismic reflectivity imaging in areas of complex salt. FWI is a non-linear data-fitting procedure that matches the predicted to observed waveform data given an initial guess of the subsurface parameters. The velocity model parameters are updated to reduce the misfit between the observed and predicted data until the misfit is sufficiently small. Sharp velocity boundaries such as between salt and sediment are often updated manually for each iteration based on the seismic reflectivity images. Here, we propose a predictive neural network architecture as a potential alternative to the complex FWI workflow. An unsupervised learning model of predicting of future frames in a video sequence is explored to simulate direct inversion procedures for seismic data. Such neural network architectures are comprised of two main components: an encoder based on convolutional neural networks (CNNs), and a recurrent neural networks (RNNs) for iteratively predicting geophysical velocity models. Both the proposed networks are able to robustly train individual layers and make a layer-specific prediction, which is compared with a target to produce an error term. It is then propagated to the subsequent network layers. With a few iterative training steps, the networks are capable of learning internal representations decoded from latent parameters of seismic wave propagation which controls how FWI velocity modelling converges. These representations learned from one dataset could be transferred to predict the future velocity model of a brand-new area where the shape of salt body is not well imaged or known. Altogether, experimental results generated from a real Gulf of Mexico seismic data suggest that the prediction represents a powerful framework for unsupervised learning, which provides an alternative approach to the FWI procedure to generate a high resolution velocity model including an accurate salt model and ultimately a sharp subsalt image.

*Zachary Ross, PhaseLink: A Deep Learning Approach to Seismic Phase Association We present PhaseLink, a deep learning approach to seismic phase association. Seismic phase association is a fundamental task in seismology that pertains to linking together phase detections on different sensors that originate from a common earthquake. This task can be challenging because the number of sources is unknown, events frequently overlap in time, or can occur simultaneously in different parts of a network. Our PhaseLink approach has many desirable properties. First, it is trained entirely on synthetic simulated data (i.e., "sim-to-real"), and is thus easily portable to any tectonic regime. Second, it is straightforward to tune PhaseLink by simply adding examples of problematic cases to the training dataset -- whereas conventional approaches require laborious adjusting of ad hoc hyperparameters. Third, we empirically demonstrate state-of-the-art performance in a wide range of settings. For instance, PhaseLink can precisely associate P- and S-picks to events that are separated by ~12 seconds in origin time. We expect PhaseLink to substantially improve many aspects of seismic analysis, including the resolution of seismicity catalogs, real-time seismic monitoring, and streamlined processing of large seismic datasets.

*Timothy Draelos, Stephen Heck, Jennifer Galasso, and Ronald Brogan, Seismic Phase Identification with a Merged Deep Neural Network Seismic signals are composed of the seismic waves (phases) that reach a sensor, similar to the way speech signals are composed of phonemes that reach a listener’s ear. We leverage ideas from speech recognition for the classification of seismic phases at a seismic sensor. Seismic Phase ID is challenging due to the varying paths and distances an event takes to reach a sensor, but there is consistent structure and ordering of the different phases arriving at the sensor. Together with scalar value measurements of seismic signal detections (horizontal slowness, amplitude, Signal-to-Noise Ratio (SNR), and the time since the previous signal detection), we use the seismogram and its spectrogram of detection waveforms as inputs to a merged deep neural network (DNN) with convolutional (CNN) and recurrent (LSTM) layers to learn the frequency structure over time of different phases. The binary classification performance of First-P phases versus non-First-P (95.6% class average accuracy) suggests a potentially significant impact on the reduction of false and missed events in seismic signal processing pipelines. Other applications include discrimination between noise and non-noise detections for induced seismicity networks and for early warning of large hazards

*Ben Moseley, Andrew Markham, and Tarje Nissen-Meyer, Fast Approximate Simulation of Seismic Waves with Deep Learning The simulation of seismic waves is a core task in many geophysical applications, yet it is computationally expensive. As an alternative approach, we simulate acoustic waves in horizontally layered media using a deep neural network. In contrast to traditional finite-difference (FD) modelling, our network is able to directly approximate the recorded seismic response at multiple receiver locations in a single inference step, without needing to iteratively model the seismic wavefield through time. This results in an order of magnitude reduction in simulation time, from the order of 1 s for FD modelling to the order of 0.1 s using our approach. Such a speed improvement could lead to real-time seismic simulation applications and benefit seismic inversion algorithms based on forward modelling, such as full waveform inversion. Our network design is inspired by the WaveNet network originally used for speech synthesis. We train our network using 50,000 synthetic examples of seismic waves propagating through different horizontally layered velocity models. We are also able to alter our WaveNet architecture to carry out seismic inversion directly on the dataset, which offers a fast inversion algorithm.

  • Men-Andrin Meier, Zachary Ross, Anshul Ramachandran, Ashwin Balakrishna, Suraj Nair, Peter Kundzicz, Zefeng Li, Egill Hauksson, Jennifer Andrews, Reliable Real-Time Signal/Noise Discrimination with Deep and Shallow Machine Learning Classifiers In Earthquake Early Warning (EEW), every sufficiently impulsive signal is potentially the first evidence for an unfolding large earthquake. More often than not, however, impulsive signals are mere nuisance signals. One of the most fundamental - and difficult - tasks in EEW is to rapidly and reliably discriminate between real local earthquake signals, and any kind of other signal. Current EEW systems struggle to avoid discrimination errors, and suffer from false and missed alerts. In this study we show how machine learning classifiers can strongly improve real-time signal/noise discrimination. We develop and compare a series of non-linear classifiers with variable architecture depths, including random forests, fully connected, convolutional (CNN, Figure 1) and recurrent neural networks, and a generative adversarial network (GAN). We train all classifiers on the same waveform data set that includes 374k 3-component local earthquake records with magnitudes M3.0-9.1, and 946k impulsive noise signals. We find that the deep architectures significantly outperform the more simple ones. Using 3s long waveform snippets, the CNN and the GAN classifiers both reach 99.5% precision and 99.3% recall on an independent validation data set. Our results suggest that machine learning classifiers can strongly improve the reliability and speed of EEW alerts.

*Mathieu Chambefort, Nicolas Salaun, Emillie Chautru, Stephan Clemencon, Guillaume Poulain, Signal and Noise Detection using Recurrent Autoencoders on Seismic Marine Data In order to meet the industrial constraints in the Big Data era, i.e. processing more and more seismic data (more than 106 shot points per marine seismic survey from [Belz and Dolymnyj, 2018]) in a more timely, reliable and efficient manner (i.e. with a better signal enhancement, [Martin et al., 2015]), we develop a deep learning approach based on recurrent LSTM ([Wong and Luo, 2018]) to the processing of seismic time series, so as to separate the signal from the noise based on the encoded information. This contribution provides empirical evidence that the representation provided by the internal layers of the autoencoder deployed encodes well the original information. More precisely, focus is here on the linear noise possibly blurring marine seismic data, which is mainly due to the tug and motor of the boat but can also be caused by bad weather or other elements, rig and other boats in the area ([Elboth et al., 2009]). The data under study are composed of massive synthetic shot points. The goal pursued is to design an autoencoder capable of detecting the possible occurrence of linear noise in the data. The encoded information is next classified and the results obtained are compared with those of a traditional technique, that essentially consists in applying directly a K -NN algorithm on the envelope of the analytical signal, as if all the dataset comes from the same area.

*Xiaojin Tan and Eldad Haber, Semantic Segmentation for Geophysical Data Segmentation of geophysical data is the process of dividing a geophysical image into multiple geological units. This process is typically done manually by experts, it is time consuming and inefficient. In recent years, machine learning techniques such as Convolutional Neural Networks (CNNs) have been used for semantic segmentation. Semantic segmentation is the process that associates each pixel in a natural image with a labeled class. When attempting to use similar technology to automatically segment geophysical data there are a number of challenges to consider, in particular, data inconsistency, scarcity and complexity. To overcome these challenges, we develop a new process that we call geophysical semantic segmentation (GSS). This process addresses the pre-processing of geophysical data in order to enable learning, the enrichment of the data set (data augmentation) by using a geo-statistical technique, referred to as Multiple-Point Simulations (MPS) and finally, the training of such a data set based on a new neural network architecture called inverse Convolution Neural Networks (iCNN) that is specifically developed to identify patterns. As demonstrated by the results on a field magnetic data set, this approach shows its competitiveness with human segmentation and indicates promising results.

*B Ravi Kiran and Stefan Milz, Aerial LiDAR reconstruction using Conditional GANS Recently, aerial LiDAR data opened lots of new opportunities for many research disciplines like macroscopic geophysical analysis or archaeological investigations. However, LiDAR measurements are expensive and the data is not widely distributed or accessible. We propose a novel method for image to image translation performing HD-LiDAR reconstruction using RGB input images based on conditional GANs. The conditional mapping function of the generator G : [c; z] -> y is transformed to G : [x; z] -> y , whereas y represents the reconstructed LiDAR map and c represents the condition. c is replaced by the aligned aerial camera image x . z represents the noise. Our approach is able to reconstruct LiDAR data as elevation maps based on small scaled training data, which includes RGB and LiDAR sample pairs based on 256  256 image matrices. The model offers the opportunity to complete geophysical LiDAR databases, where measurements are missing. The method is validated on the ISPRS dataset with an overall rRMSE of 14.53% .

Zheng Zhou, Youzuo Lin, Zhongping Zhang, Zan Wang, Robert Dilmore and George Guthrie, CO2 and Brine Leakage Detection Using Multi-Physics-Informed Convolutional Neural Networks In carbon capture and sequestration, it is crucial to build effective monitoring techniques to detect both brine and CO2 leakage from legacy wells into underground sources of drinking water. The CO2 and brine leakage detection methods rely on geophysical observations from different physical domains. Most of the current detection methods are built on physical models, and the leakage mass of CO2 and brine are detected separately. However, those physics-driven methods can be computationally demanding and yields low detection accuracy. In this paper, we developed a novel end-to-end data-driven detection method, called multi-physics-informed convolutional neural network (Multi-physics CNN), which directly learns a mapping relationship between physical measurements and leakage mass. Our Multi-physical CNN takes simulated reflection seismic and pressure data as inputs, and captures different patterns in leakage process. In particular, we capture two types of multi-physical features from seismic and pressure data, respectively. With those features, we can further detect the CO2 and brine leakage mass, simultaneously. We evaluate our novel method for CO2 and brine leakage mass detection task on simulated multi-physical datasets generated using Kimberlina 1.2 model. Our results show that our Multi-physics CNN yields promising results in detecting both leakage mass of CO2 and brine.

Fri 7:40 a.m. - 7:40 a.m. [iCal]

Enhanced Seismic Imaging with Predictive Neural Networks for Geophysics Ping Lu, Yanyan Zhang, Jianxiong Chen, Seth Brazell, Mostafa Karimi Anadarko Petroleum Corporation, Houston, and Texas A&M University--College Station

We propose a predictive neural network architecture that can be utilized to update reference velocity models as inputs to full waveform inversion (FWI). Deep learning models are explored to augment velocity model building workflows during 3D seismic volume reprocessing in salt-prone environments. Specifically, a neural network architecture, with 3D convolutional, de-convolutional layers, and 3D max-pooling, is designed to take standard amplitude 3D seismic volumes as an input. Enhanced data augmentations through generative adversarial networks and a weighted loss function enable the network to train with few sparsely annotated slices. Batch normalization is also applied for faster convergence. Moreover, a 3D probability cube for salt bodies is generated through ensembles of predictions from multiple models in order to reduce variance. Velocity models inferred from the proposed networks provide opportunities for FWI forward models to converge faster with an initial condition closer to the true model. In each iteration step, the probability cubes of salt bodies inferred from the proposed networks can be used as a regularization term in FWI forward modelling, which may result in an improved velocity model estimation while the output of seismic migration can be utilized as an input of the 3D neural network for subsequent iterations.

Ping Lu
Fri 7:40 a.m. - 7:40 a.m. [iCal]

Mauricio Araya-Polo, Stuart Farris and Manuel Florez Standford University and Shell International Exploration & Production Inc.

Combining Unsupervised and Supervised Deep Learning approaches for Seismic Tomography

Signals from inner earth, seismic waveforms, are heavily manipulated before human interpreters have a chance of figuring the subsurface structures. That manipulation adds modeling biases and it is limited by methodological shortcomings. Alternatively, using waveforms directly is becoming possible thanks to current Deep Learning (DL) advances such as (Araya-Polo et al., 2017 and 2018; Lin et al., 2017). Further extending that work, we present a DL approach that takes realistic raw seismic waveforms as inputs and produces subsurface velocity models as output. When insufficient data is used for training, DL algorithms tend to either over-fit or fail completely. Gathering large amounts of labeled and standardized seismic data sets is not straight forward. We address this shortage of quality data by building a Generative Adversarial Network (GAN) to augment our original training data set, which then is used by the DL seismic tomography as input.

Mauricio Araya
Fri 7:45 a.m. - 7:45 a.m. [iCal]

Jorge Guevara, Blanca Zadrozny, Alvaro Buoro, Ligang Lu, John Tolle, Jan Limbeck, Mingqi Wu, Defletf Hohl IBM Research and Shell Inc.

An Interpretable Machine Learning Methodology for Well Data Integration and Sweet Spotting Identification. The huge amount of heterogeneous data provided by the petroleum industry brings opportunities and challenges for applying machine learning methodologies aimed to optimize and automate process and procedures in this area. For instance, petrophysical data recorded in well logs, completions datasets and well production data also constitute good examples of data for training machine learning models with the aim of automating procedures and giving data-driven solutions to problems arisen in the petroleum industry. In this work, we present a machine learning methodology for oil exploration that 1) integrates heterogeneous well data such as: completions, engineering values, well production data and petrophysical data; 2) performs feature engineering of petrophysical data from horizontal and vertical wells using Gaussian Process Regression (Kriging); 3) it enables the discovery of new locations with high potential for production by using machine learning modeling for sweet spotting identification; 4) it facilitates the analysis of the effect, role, and impact of some engineering decisions on production by means of interpretable Machine learning modeling; 5) it allows the incorporation of prior/expert knowledge by using Shape Constraint Additive Models and; 6) it enables the construction of hypothetical "what-if" scenarios for production prediction, by means of conditional plots based on residual plots analysis. We validated this methodology using real well production data. We used nested leave-one-out cross-validation for assessing the generalization power of models. Among the results, it is important to highlight that 1) performance improves by including prior knowledge via SCAMs, for example, we have a percentage change of 24\% between the best RMSE result from black-box ML models vs a model that incorporates prior knowledge. 2) we were able to construct hypothetical what-if scenarios based on actual petrophysical data and hypothetical completion and engineering values, 3) we were able to assess the validity of ML models through effect analysis via conditional plots.

Jorge Guevara Diaz
Fri 7:45 a.m. - 7:45 a.m. [iCal]

Drilling State Classification with Machine Learning Yuxing Ben, Chris James, Dingzhou Cao Advanced Analytics and Emerging Technology, Anadarko Petroleum Corporation

The sensors on drilling rigs and production sites are leading oil and gas companies to mine so-called big data. Leveraging historical time series data and real-time drilling data can help drilling engineers improve rig and well delivery efficiencies; however, it can also help geoscientists understand the geophysical properties of the reservoir. In this case study, we describe how to use machine learning to classify drilling states. We investigated several machine learning methods and architectures including Random Forest tree models, Convolutional Neural Networks, and Recurrent Neural Networks which were then tested against 15 million rows of real, labeled drilling time-series data. We found that machine learning models were superior to rule based models. For wells drilled in two different onshore basins, the accuracies of our in-house rule based models were 70% and 90% respectively, while the accuracies of machine learning models were over 99%. The best identified machine learning model has been deployed in a drilling analytics platform and used to automatically detect the drilling state in realtime for use by Drilling Engineers to evaluate and analyze well performance.

Yuxing Ben
Fri 7:50 a.m. - 7:50 a.m. [iCal]

Seismic Phase Identification with a Merged Deep Neural Network

Timothy J. Draelos, Stephen Heck, Jennifer Galasso, Ronald Brogan Sandia National Laboratories & ENSCO, Inc.

Seismic signals are composed of the seismic waves (phases) that reach a sensor, similar to the way speech signals are composed of phonemes that reach a listener’s ear. We leverage ideas from speech recognition for the classification of seismic phases at a seismic sensor. Seismic Phase ID is challenging due to the varying paths and distances an event takes to reach a sensor, but there is consistent structure and ordering of the different phases arriving at the sensor. Together with scalar value measurements of seismic signal detections (horizontal slowness, amplitude, Signal-to-Noise Ratio (SNR), and the time since the previous signal detection), we use the seismogram and its spectrogram of detection waveforms as inputs to a merged deep neural network (DNN) with convolutional (CNN) and recurrent (LSTM) layers to learn the frequency structure over time of different phases. The binary classification performance of First-P phases versus non-First-P (95.6% class average accuracy) suggests a potentially significant impact on the reduction of false and missed events in seismic signal processing pipelines. Other applications include discrimination between noise and non-noise detections for induced seismicity networks and for early warning of large hazards.

Tim Draelos
Fri 7:50 a.m. - 7:50 a.m. [iCal]

PhaseLink: A Deep Learning Approach to Seismic Phase Association Zachary Ross California Institute of Technology

We present PhaseLink, a deep learning approach to seismic phase association. Seismic phase association is a fundamental task in seismology that pertains to linking together phase detections on different sensors that originate from a common earthquake. This task can be challenging because the number of sources is unknown, events frequently overlap in time, or can occur simultaneously in different parts of a network. Our PhaseLink approach has many desirable properties. First, it is trained entirely on synthetic simulated data (i.e., "sim-to-real"), and is thus easily portable to any tectonic regime. Second, it is straightforward to tune PhaseLink by simply adding examples of problematic cases to the training dataset -- whereas conventional approaches require laborious adjusting of ad hoc hyperparameters. Third, we empirically demonstrate state-of-the-art performance in a wide range of settings. For instance, PhaseLink can precisely associate P- and S-picks to events that are separated by ~12 seconds in origin time. We expect PhaseLink to substantially improve many aspects of seismic analysis, including the resolution of seismicity catalogs, real-time seismic monitoring, and streamlined processing of large seismic

Zachary Ross
Fri 7:55 a.m. - 7:55 a.m. [iCal]

Fast approximate simulation of seismic waves with deep learning Ben Moseley, Andrew Markham, and Tarje Nissen-Meyer Centre for Doctoral Training in Autonomous Intelligent Machines and Systems, University of Oxford, UK & Department of Earth Sciences, University of Oxford, UK

The simulation of seismic waves is a core task in many geophysical applications, yet it is computationally expensive. As an alternative approach, we simulate acoustic waves in horizontally layered media using a deep neural network. In contrast to traditional finite-difference (FD) modelling, our network is able to directly approximate the recorded seismic response at multiple receiver locations in a single inference step, without needing to iteratively model the seismic wavefield through time. This results in an order of magnitude reduction in simulation time, from the order of 1 s for FD modelling to the order of 0.1 s using our approach. Such a speed improvement could lead to real-time seismic simulation applications and benefit seismic inversion algorithms based on forward modelling, such as full waveform inversion. Our network design is inspired by the WaveNet network originally used for speech synthesis. We train our network using 50,000 synthetic examples of seismic waves propagating through different horizontally layered velocity models. We are also able to alter our WaveNet architecture to carry out seismic inversion directly on the dataset, which offers a fast inversion algorithm.

Ben Moseley
Fri 7:55 a.m. - 7:55 a.m. [iCal]

Reliable Real-Time Signal/Noise Discrimination with Deep and Shallow Machine Learning Classifiers Men-Andrin Meier, Zachary Ross, Anshul Ramachandran, Ashwin Balakrishna, Suraj Nair, Peter Kundzicz, Zefeng Li, Egill Hauksson, Jennifer Andrews California Institute of Technology

In Earthquake Early Warning (EEW), every sufficiently impulsive signal is potentially the first evidence for an unfolding large earthquake. More often than not, however, impulsive signals are mere nuisance signals. One of the most fundamental - and difficult - tasks in EEW is to rapidly and reliably discriminate between real local earthquake signals, and any kind of other signal. Current EEW systems struggle to avoid discrimination errors, and suffer from false and missed alerts. In this study we show how machine learning classifiers can strongly improve real-time signal/noise discrimination. We develop and compare a series of non-linear classifiers with variable architecture depths, including random forests, fully connected, convolutional (CNN, Figure 1) and recurrent neural networks, and a generative adversarial network (GAN). We train all classifiers on the same waveform data set that includes 374k 3-component local earthquake records with magnitudes M3.0-9.1, and 946k impulsive noise signals. We find that the deep architectures significantly outperform the more simple ones. Using 3s long waveform snippets, the CNN and the GAN classifiers both reach 99.5% precision and 99.3% recall on an independent validation data set. Our results suggest that machine learning classifiers can strongly improve the reliability and speed of EEW alerts. Figure

Men-Andrin Meier
Fri 8:00 a.m. - 8:00 a.m. [iCal]

Signal and Noise Detection using Recurrent Autoencoders on Seismic Marine Data

Mathieu Chambefort, Nicolas Salaun, Emilie Chautru, Stephan Clémençon and Guillaume Poulain

MINES ParisTech - PSL University Centre de Géosciences, CGG, and Telecom ParisTech, LTCI, Université Paris Saclay

In the Big Data era, geophysics are faced with new industrial contrains like processing more and more seismic data (more than 106 shot points per marine seismic survey [Belz and Dolymnyj, 2018]) in a more timely, reliable and efficient manner (improving signal enhancement, [Martin et al., 2015]). To deal with these challenges, we develop a deep learning approach based on recurrent LSTM ([Wong and Luo, 2018]) to the processing of seismic time series; this separates the signal from the noise based on the encoded information. This contribution provides empirical evidence that the representation provided by the internal layers of the deployed autoencoder encodes the original information well. More precisely, focus is here on the linear noise that possibly blurs marine seismic data ([Elboth et al., 2009]). The data under study is composed of massive synthetic shot points. The goal pursued is to design an autoencoder capable of detecting the possible occurrence of linear noise in the data. Next, the encoded information is classified. The obtained results are compared with those of a traditional technique, which essentially consists in applying directly a K-NN algorithm on the envelope of the analytical signal, as if all the dataset came from the same area.

Mathieu Chambefort
Fri 8:00 a.m. - 8:00 a.m. [iCal]

Semantic Segmentation for Geophysical Data

Xiaojin Tan and Eldad Haber The University of British Columbia, Vancouver, BC, Canada

Segmentation of geophysical data is the process of dividing a geophysical image into multiple geological units. This process is typically done manually by experts, it is time consuming and inefficient. In recent years, machine learning techniques such as Convolutional Neural Networks (CNNs) have been used for semantic segmentation. Semantic segmentation is the process that associates each pixel in a natural image with a labeled class. When attempting to use similar technology to automatically segment geophysical data there are a number of challenges to consider, in particular, data inconsistency, scarcity and complexity. To overcome these challenges, we develop a new process that we call geophysical semantic segmentation (GSS). This process addresses the pre-processing of geophysical data in order to enable learning, the enrichment of the data set (data augmentation) by using a geo-statistical technique, referred to as Multiple-Point Simulations (MPS) and finally, the training of such a data set based on a new neural network architecture called inverse Convolution Neural Networks (iCNN) that is specifically developed to identify patterns. As demonstrated by the results on a field magnetic data set, this approach shows its competitiveness with human segmentation and indicates promising results.

Xiaojin Tan
Fri 8:05 a.m. - 8:05 a.m. [iCal]

Aerial LiDAR reconstruction using conditional GANs Isabelle Leang, B Ravi Kiran and Stefan Milz Recently, aerial LiDAR data opened lots of new opportunities for many research disciplines like macroscopic geophysical analysis or archaeological investigations. However, LiDAR measurements are expensive and the data is not widely distributed or accessible. We propose a novel method for image to image translation performing HD-LiDAR reconstruction using RGB input images based on conditional GANs. The conditional mapping function of the generator G : [c; z] -> y is transformed to G : [x; z] -> y , whereas y represents the reconstructed LiDAR map and c represents the condition. c is replaced by the aligned aerial camera image x . z represents the noise. Our approach is able to reconstruct LiDAR data as elevation maps based on small scaled training data, which includes RGB and LiDAR sample pairs based on 256 x 256 image matrices. The model offers the opportunity to complete geophysical LiDAR databases, where measurements are missing. The method is validated on the ISPRS dataset with an overall rRMSE of 14.53%.

Isabelle Leang
Fri 8:05 a.m. - 8:05 a.m. [iCal]

CO2 and Brine Leakage Detection Using Multi-Physics-Informed Convolutional Neural Networks

Zheng Zhou, Youzuo Lin, Zhongping Zhang, Zan Wang, Robert Dilmore and George Guthrie

Electrical Engineering Department at State university of New York at Buffalo, Los Alamos National Laboratory, and National Energy Technology Laboratory, United States Department of Energy, Pittsburgh, PA 15236.

In carbon capture and sequestration, it is crucial to build effective monitoring techniques to detect both brine and CO2 leakage from legacy wells into underground sources of drinking water. The CO2 and brine leakage detection methods rely on geophysical observations from different physical domains. Most of the current detection methods are built on physical models, and the leakage mass of CO2 and brine are detected separately. However, those physics-driven methods can be computationally demanding and yields low detection accuracy. In this paper, we developed a novel end-to-end data-driven detection method, called multi-physics-informed convolutional neural network (Multi-physics CNN), which directly learns a mapping relationship between physical measurements and leakage mass. Our Multi-physical CNN takes simulated reflection seismic and pressure data as inputs, and captures different patterns in leakage process. In particular, we capture two types of multi-physical features from seismic and pressure data, respectively. With those features, we can further detect the CO2 and brine leakage mass, simultaneously. We evaluate our novel method for CO2 and brine leakage mass detection task on simulated multi-physical datasets generated using Kimberlina 1.2 model. Our results show that our Multi-physics CNN yields promising results in detecting both leakage mass of CO2 and brine.

Youzuo Lin
Fri 8:10 a.m. - 8:10 a.m. [iCal]

Deep Semi-Supervised Learning Approach in Characterizing Salt on Seismic Images

Licheng Zhang, Zhenzhen Zhong, Meng Zhang, Tianxia Zhao, Varun Tyagi, Cheng Zhan

The salt body characterization is crucial in exploration and drilling. Due to its mobility, salt can move extensively to create diapirs, which generate significant traps for hydrocarbons, meanwhile, they present drilling hazards, as salt intrusion distorts the stress field making wellbore stability challenging in the geomechanical models. Here we utilized deep learning to identify salt body based on seismic images. Many techniques from the domains of geophysics and data science, have been successfully incorporated into the work-flow. The seismic images are produced from various locations. Here we use convolutional neural network that is the main methodology to process images segmentations. The underlying architecture is dedicated to restoring pixel position. In addition, the highlight here is Semi-Supervised learning, and we utilized the large unlabeled test set to gain more understanding of the data distribution, and the pseudo labeling of unlabeled test set comes from prediction. The metric implemented is “IOU”, Intersection over Union, which fundamentally measures how much area the predicted salt body overlay with the true answer. Our IOU score is 0.849, equivalent to 95% of the predicted salt body is correct. Challenges still exist as geology varies across locations, and the corresponding features might not share similar statistical properties.

Cheng Zhan
Fri 8:10 a.m. - 8:10 a.m. [iCal]

Tremor Generative Adversarial Networks: A Deep Generative Model Approach for Geophysical Signal Generation

Inspired by the recent success of the Generative Adversarial Networks (GANs) for images, we propose to employ GANs to generate realistic geophysical signals from labeled data. Signals, here, include seismicity, sedimentary sequences, geological models etc. We present a preliminary application of a GAN to generate tremors: Synthetic tremors generated by one of our GANs, trained with data collected in Mexico. Studying the trained GANs facilitates our understanding of the data generating process. These GANs can also be inverted into inference algorithms that capture intrinsic properties of the generating process. GAN-generated tremors can be used as templates to help detect additional tremors and potentially result in better generalization to new sensor signals.

Fri 8:15 a.m. - 9:00 a.m. [iCal]
Poster Session
Fri 8:15 a.m. - 8:15 a.m. [iCal]

Data Challenge: Machine Learning for Earthquake Detection and Rupture Timing

Laura Pyrak-Nolte, Richard Baraniuk, Greg Beroza, Maarten de Hoop, Brad Hager, Eugene Ilton, Paul Johnson, Steve Laubach, Alan Levander, Semechah Lui, Joe Morris, Beatrice Rivera, James Rustad

Affiliations: Purdue University, Rice University, Stanford University, MIT, PNNL, LANL, Bureau of Economic Geology, University of Toronto, LLNL, Department of Energy-Basic Energy Sciences

Major breakthroughs and discoveries in geophysics are anticipated because of increases in computational power, massive sensor deployments that yield massive datasets, and advancements in machine learning algorithms. However, owing to the inherent complexity, machine learning methods are prone to misapplication, lack of transparency, and often do not attempt to produce interpretable models. Moreover, due to the flexibility in specifying machine learning models, results are often insufficiently documented in research articles, hindering both reliable assessment of model validity and consistent interpretation of model outputs. By providing documented datasets and challenging teams to apply fully documented workflows for machine learning approaches, we expect to accelerate progress in the application of data science to longstanding research issues in geophysics.

In this poster presentation, the guidelines for a challenge problem will be given. Challenge 1 will address the physics of rupture and timing of earthquakes (from laboratory data collected during shearing of gouge-filled faults). While using the data set in the challenge, the expected reported information pertains to supervised and unsupervised machine learning components:
● the architecture of the machine-learning approach and why it was chosen;
● the loss function and learning rule;
● preprocessing designed and applied as appropriate;
● description of and choice of the set of hyperparameters;
● description of featurization or feature learning.
We expect the design of the machine learning approach to be an iterative process and seek a description of this. In view of the lack of ground truth data in general, while being a physics-based challenge, we invite proposed metrics to validate and compare the performance of the different results. Information will be provided on how to obtain the data, timelines for completion of the challenge, and reporting of results.

Acknowledgment:  US Department of Energy, Office of Basic Energy Sciences, Chemical Sciences, Geosciences and Biosciences Division.

Laura Pyrak-Nolte
Fri 9:00 a.m. - 11:00 a.m. [iCal]
Lunch
Fri 11:00 a.m. - 11:20 a.m. [iCal]

Estimating the State of Faults from the Full Continuous Seismic Data Using Machine Learning

Nearly all aspects of earthquake rupture are controlled by the friction along the fault that progressively increases with tectonic forcing, but in general cannot be directly measured. Using machine learning, we show that instantaneous statistical characteristics of the seismic data are a fingerprint of the fault zone frictional state in laboratory experiments. Using a similar methodology in Earth, where we rely on other geophysical datasets as labels in order to extract informative signals from raw seismic waves, we show that subduction zones are continuously broadcasting a tremor-like signal that precisely informs of fault displacement rate throughout their slow earthquake slip cycle. We posit that this signal provides indirect, real-time access to frictional properties of megathrusts and may ultimately reveal a connection between slow slip and megaquakes

Bertrand Rouet-Leduc
Fri 11:20 a.m. - 11:40 a.m. [iCal]

Geometric Deep Learning for Many-Particle and non Euclidean Systems

Across many areas of science, one is required to process data defined on irregular and non-Euclidean domains. For example, in particle physics, measurements in the LHC are highly variable particle collisions with cylindrical calorimeters, whereas the IceCube detector looks for neutrinos using an irregular 3d array of sensors. Despite such non-Euclidean structure, many of these tasks satisfy essential geometric priors, such as stability to deformations. In this talk, I will describe a broad family of neural architectures that leverage such geometric priors to learn efficient models with provable stability. I will also describe recent and current progress on several applications including particle physics and inverse problems.

Joan Bruna
Fri 11:40 a.m. - 12:00 p.m. [iCal]

Machine Learning Reveals the Coupling Between Slow Slips and Major Earthquakes

The potential connection between slow slips and earthquakes of large magnitude in subduction zones remains an open question in seismology. Slow slips (earthquakes releasing energy over long periods of times, up to several months) have been observed preceding major earthquake ruptures, suggesting that they may couple to or evolve into a megaquake.

We rely on supervised machine learning algorithms to analyze vast amounts of continuous seismic data, with the goal of identifying hidden signals preceding earthquakes. We find that continuous seismic signals identified in our previous studies of slow slip events carry information about the timing of impending earthquakes of large magnitude. Our results suggest that large earthquakes occur almost systematically in the same phase of the slow slip cycle, and point to a systematic, large-scale coupling between slow slip events and major earthquakes.

Claudia Hulbert
Fri 12:00 p.m. - 12:30 p.m. [iCal]
Coffee Break (Break)
Fri 12:30 p.m. - 12:50 p.m. [iCal]

I will present a new learning-based approach to ill-posed inverse problems. Instead of directly learning the ill-posed inverse mapping, we learn an ensemble of simpler mappings from the data to the projections of the unknown model into random low-dimensional subspaces. We choose structured subspaces of piecewise-constant images on random Delaunay triangulations. With this choice, the projected inverse maps are simpler to learn in terms of robustness and generalization error. We form the reconstruction by combining the estimated subspace projections. This allow us to address inverse problems with extremely sparse data and still get good reconstructions of the unknown geometry; it also makes our method robust against arbitrary data corruptions not seen during training. Further, it marginalizes the role of the training dataset which is essential for applications in geophysics where ground-truth datasets are exceptionally scarce.

Ivan Dokmanić
Fri 12:50 p.m. - 1:10 p.m. [iCal]

Towards Realtime Hydraulic Fracture Monitoring using Machine Learning and Distributed Fiber Sensing

Joseph Morris, Christopher Sherman, Robert Mellors, Frederick Ryerson, Charles Yu, Michael Messerly

Abstract: Hydraulic fracturing operations (“pumping jobs”) are typically planned well in advance and do not allow for on-the-fly modification of control parameters, such as pumping rate and viscosity enhancement, that can be used to optimize the efficacy of the operation. Monitoring technologies, such as microseismic, have enabled an iterative cycle where observations of one pumping job may influence the selection of parameters of subsequent jobs. However, the significant time lag introduced by data processing and interpretation means that the iterative cycle may take weeks. We seek to enable a future where data collected during a job enables actionable, realtime decision making. Recent advances in distributed acoustic sensor (DAS) technology have produced a source of abundant new data for monitoring processes in the subsurface. Because of the massive dataset size (TB per day), developing a machine learning approach for interpreting DAS data is essential for effective use, such as in operational situations, which require near-realtime results. In our work, we use the massively parallel multi-physics code GEOS to generate a catalog of synthetic DAS measurements that are typical of those recorded during the stimulation of a hydraulic fracture. We then relate physical observables in the model such as the extents of the generated fractures, fluid flow, and interactions with pre-existing rock fractures to the DAS. These data quantify the potential of DAS measurements for revealing subsurface processes in realtime. Determining how best to construct and train a neural network is challenging. We will present our specific approach to building a deep neural network, including the nature of the training data and subsequent success of the network in identifying features. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.

Joe Morris
Fri 1:10 p.m. - 1:30 p.m. [iCal]

Accurate and Efficient Seismic Waveform-Inversion with Convolutional Neural Networks

Seismic full-waveform inversion has become a promising tool for velocity estimation in complex geological structures. The traditional seismic full-waveform inversion problems are usually posed as nonlinear optimization problems. Solving full-waveform inversion can be computationally challenging for two major reasons. One is the expensive computational cost and the other is the issue of local minima. In this work, we develop an end-to-end data-driven inversion technique, called “InversionNet”, to learn a regression relationship from seismic waveform datasets to subsurface models. Specifically, we build a novel deep convolutional neural network with an encoder-decoder structure, where the encoder learns an abstract representation of the seismic data, which is then used by the decoder to produce a subsurface model. We further incorporate atrous convolutions in our network structure to account for contextual information from the subsurface model. We evaluate the performance of our InversionNet with synthetic seismic waveform data. The experiment results demonstrate that our InversionNet not only yields accurate inversion results but also produces almost real-time inversion.

Youzuo Lin
Fri 1:30 p.m. - 2:30 p.m. [iCal]
Panel Discussion
Richard Baraniuk, Maarten V. de Hoop, Paul A Johnson

Author Information

Laura Pyrak-Nolte (Purdue University)

Dr. Laura J. Pyrak-Nolte is a Distinguished Professor of Physics & Astronomy, in the College of Science, at Purdue University. She holds courtesy appointments in the Lyle School of Civil Engineering and in the Department of Earth, Atmospheric and Planetary Sciences, also in the College of Science. Dr. Pyrak-Nolte holds a B.S. in Engineering Science from the State University of New York at Buffalo, an M.S. in Geophysics from Virginia Polytechnic Institute and State University, and a Ph.D. in Materials Science and Mineral Engineering from the University of California at Berkeley where she studied with Dr. Neville G. W. Cook. Her interests include applied geophysics, experimental and theoretical seismic wave propagation, laboratory rock mechanics, micro-fluidics, particle swarms, and fluid flow through Earth materials. In 1995, Dr. Pyrak-Nolte received the Schlumberger Lecture Award from the International Society of Rock Mechanics. In 2013, she was made a Fellow of the American Rock Mechanics Association (ARMA). Currently she the President of the American Rock Mechanics Association, and president-elect of the International Society for Porous Media

Jim Rustad (University of California Davis)

Jim received his Ph.D. in Geophysics form the University of Minnesota in 1992. He worked as a research scientist at Pacific Northwest National Laboratory (1992-2003), professor at the University of California, Davis (2003-2010), and research associate at Corning Incorporated (2010-2015). Currently retired from the University of California, his ongoing research focuses on aqueous interfacial chemistry, isotope geochemistry, and earth materials.

Richard Baraniuk (Rice University)

More from the Same Authors