`

Timezone: »

 
Workshop
The Symbiosis of Deep Learning and Differential Equations
Luca Celotti · Kelly Buchanan · Jorge Ortiz · Patrick Kidger · Stefano Massaroli · Michael Poli · Lily Hu · Ermal Rrapaj · Martin Magill · Thorsteinn Jonsson · Animesh Garg

Tue Dec 14 05:00 AM -- 01:45 PM (PST) @ None
Event URL: https://dl-de.github.io/ »

Deep learning can solve differential equations, and differential equations can model deep learning. What have we learned and where to next?

The focus of this workshop is on the interplay between deep learning (DL) and differential equations (DEs). In recent years, there has been a rapid increase of machine learning applications in computational sciences, with some of the most impressive results at the interface of DL and DEs. These successes have widespread implications, as DEs are among the most well-understood tools for the mathematical analysis of scientific knowledge, and they are fundamental building blocks for mathematical models in engineering, finance, and the natural sciences. This relationship is mutually beneficial. DL techniques have been used in a variety of ways to dramatically enhance the effectiveness of DE solvers and computer simulations. Conversely, DEs have also been used as mathematical models of the neural architectures and training algorithms arising in DL.

This workshop will aim to bring together researchers from each discipline to encourage intellectual exchanges and cultivate relationships between the two communities. The scope of the workshop will include important topics at the intersection of DL and DEs.

Tue 5:00 a.m. - 5:15 a.m.
Introduction and opening remarks (Introduction)
Tue 5:15 a.m. - 6:00 a.m.
Neha Yadav - Deep learning methods for solving differential equations (Invited Talk)  link »
Tue 6:00 a.m. - 6:15 a.m.
Contributed Talk 1 (Contributed Talk)
Tue 6:15 a.m. - 6:20 a.m.
Coffee Break (Break)
Tue 6:20 a.m. - 6:30 a.m.
Poster Spotlights 1 (Poster Spotlights)
Tue 6:30 a.m. - 7:15 a.m.
Poster Session 1 (Poster Session)
Tue 7:15 a.m. - 8:00 a.m.
Philipp Grohs - The Theory-to-Practice Gap in Deep Learning (Invited Talk)
Philipp Grohs
Tue 8:00 a.m. - 8:05 a.m.
Coffee Break (Break)
Tue 8:05 a.m. - 9:15 a.m.
Panel Discussion
Tue 9:15 a.m. - 10:15 a.m.
Lunch Break (Break)
Tue 10:15 a.m. - 11:00 a.m.
Weinan E - Maximum principle-based algorithm for deep learning (Invited Talk)
Weinan E
Tue 11:00 a.m. - 11:15 a.m.
Contributed Talk 2 (Contributed Talk)
Tue 11:15 a.m. - 11:20 a.m.
Coffee Break (Break)
Tue 11:20 a.m. - 11:30 a.m.
Poster Spotlights 2 (Poster Spotlights)
Tue 11:30 a.m. - 12:15 p.m.
Poster Session 2 (Poster Session)
Tue 12:15 p.m. - 1:00 p.m.
Anima Anandkumar - Neural operator: A new paradigm for learning PDEs (Invited Talk)
Animashree Anandkumar
Tue 1:00 p.m. - 1:15 p.m.
Contributed Talk 3 (Contributed Talk)
Tue 1:15 p.m. - 1:30 p.m.
Contributed Talk 4 (Contributed Talk)
Tue 1:30 p.m. - 1:45 p.m.
Final Remarks
-
[ OpenReview  link »

We present Graph Neural Diffusion (GRAND), a model that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, achieving competitive results on many standard graph benchmarks.

Benjamin Chamberlain · · Maria Gorinova · Stefan Webb · Emanuele Rossi · Michael Bronstein
-
[ OpenReview  link »

Neural network-based methods for solving differential equations have been gaining traction. They work by improving the differential equation residuals of a neural network on a sample of points in each iteration. However, most of them employ standard sampling schemes like uniform or perturbing equally spaced points. We present a novel sampling scheme which samples points adversarially to maximize the loss of the current solution estimate. A sampler architecture is described along with the loss terms used for training. Finally, we demonstrate that this scheme outperforms pre-existing schemes by comparing both on a number of problems.

Kshitij Parwani · Pavlos Protopapas
-
[ OpenReview  link »

Physics-informed neural networks (PINNs) promise to significantly speed up partial differential equation (PDE) solvers. However, most PINNs can only solve deterministic PDEs. Here, we consider \textit{stochastic} PDEs that contain partially unknown parameters. We aim to quickly quantify the impact of uncertain parameters onto the solution of a PDE - that is - we want to perform fast uncertainty propagation. Classical uncertainty propagation methods such as Monte Carlo sampling, stochastic Galerkin, collocation, or discrete projection methods become computationally too expensive with an increasing number of stochastic parameters. For example, the well-known spectral or polynomial chaos expansions achieve to separate the spatiotemporal and probabilistic domains and offer theoretical guarantees and fast computation of stochastic summaries (e.g., mean), but can be computationally expensive to form. Our Spectral-PINNs approximate the underlying spectral coefficients with a neural network and reduce the computational cost of the spectral expansion while maintaining guarantees. We derive the method for partial differential equations, discuss runtime, demonstrate initial results on the convection-diffusion equation, and provide steps towards convergence guarantees.

Björn Lütjens · Mark Veillette · Dava Newman
-
[ OpenReview  link »

Uncertainty quantification (UQ) helps to make trustworthy predictions based on collected observations and uncertain domain knowledge. With increased usage of deep learning in various applications, the need for efficient UQ methods that can make deep models more reliable has increased as well. Among applications that can benefit from effective handling of uncertainty are the deep learning based differential equation (DE) solvers. We adapt several state-of-the-art UQ methods to get the predictive uncertainty for DE solutions and show the results on four different DE types.

Olga Graf · Pablo Flores · Pavlos Protopapas
-
[ OpenReview  link »

Time series analysis is a widespread task in Natural Sciences, Social Sciences and Engineering. A fundamental problem is finding an expressive yet efficient-to-compute representation of the input time series to use as a starting point to perform arbitrary downstream tasks. In this paper, we build upon recent work using the signature of a path as a feature map and investigate a computationally efficient technique to approximate these features based on linear random projections. We present several theoretical results to justify our approach, we analyze and showcase its empirical performance on the task of learning a mapping between the input controls of a Stochastic Differential Equation (SDE) and its corresponding solution. Our results show that the representational power of the proposed random features allows to efficiently learn the aforementioned mapping.

Enea Monzio Compagnoni · Luca Biggio · Antonio Orvieto
-
[ OpenReview  link »

Multilayer Perceptrons (MLPs) defines a fundamental model class that forms the backbone of many modern deep learning architectures. Despite their universality guarantees, practical training via stochastic gradient descent often struggles to attain theoretical error bounds due to issues including (but not limited to) frequency bias, vanishing gradients, and stiff gradient flows. In this work we postulate that many of such issues find origins in the initialization of the network's parameters. While the initialization schemes proposed by Glorot {\it et al.} and He {\it et al.} have become the de-facto choices among practitioners, their goal to preserve the variance of forward- and backward-propagated signals is mainly achieved by assumptions on linearity, while the presence of nonlinear activation functions may partially destroy these efforts. Here, we revisit the initialization of MLPs from a dynamical systems viewpoint to explore why and how under these classical scheme, the MLP could still fail even at the beginning. Drawing inspiration from classical numerical methods for differential equations that leverage orthogonal feature representations, we propose a novel initialization scheme that promotes orthogonality in the features of the last hidden layer, ultimately leading to more diverse and localized features. Our results demonstrate that network initialization alone can be sufficient in mitigating frequency bias and yields competitive results for high-frequency function approximation and image regression tasks, without any additional modifications to the network architecture or activation functions.

Hanwen Wang · Paris Perdikaris
-
[ OpenReview  link »

We present a data-driven approach to iteratively solve the discrete heterogeneous Helmholtz equation at high wavenumbers. We combine multigrid ingredients with convolutional neural networks (CNNs) to form a preconditioner which is applied within a Krylov solver. Two types of preconditioners are proposed 1) U-Net as a coarse grid solver, and 2) U-Net as a deflation operator with shifted Laplacian V-cycles. The resulting CNN preconditioner can generalize over residuals and a relatively general set of wave slowness models. On top of that, we offer an encoder-solver framework where an encoder'' network generalizes over the medium and sends context vectors to anothersolver'' network, which generalizes over the right-hand-sides. We show that this option is more efficient than the stand-alone variant. Lastly, we suggest a mini-retraining procedure, to improve the solver after the model is known. We demonstrate the efficiency and generalization abilities of our approach on a variety of 2D problems.

Yael Azulay · Eran Treister
-
[ OpenReview  link »

Score-based (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data to noise and generate data by reversing it. Unfortunately, current score-based models generate data very slowly due to the sheer number of score network evaluations required by numerical SDE solvers. In this work, we aim to accelerate this process by devising a more efficient SDE solver. Our solver requires only two score function evaluations per step, rarely rejects samples, and leads to high-quality samples. Our approach generates data 2 to 10 times faster than EM while achieving better or equal sample quality. For high-resolution images, our method leads to significantly higher quality samples than all other methods tested. Our SDE solver has the benefit of requiring no step size tuning.

Alexia Jolicoeur-Martineau · Ke Li · Rémi Piché-Taillefer · Tal Kachman · Ioannis Mitliagkas
-
[ OpenReview  link »

Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs. However, fixed-point arithmetic is not natural to the type of computations involved in neural networks. In our work, we consider symmetric and stable variants of common CNNs for image classification, and Graph Convolutional Networks (GCNs) for graph node-classification. We demonstrate through several experiments that the property of forward stability preserves the action of a network under different quantization rates, allowing stable quantized networks to behave similarly to their non-quantized counterparts while using fewer parameters. We also find that at times, stability aids in improving accuracy. These properties are of particular interest for sensitive, resource-constrained or real-time applications.

Ido Ben-Yair · Moshe Eliasof · Eran Treister
-
[ OpenReview  link »

Multigrid (MG) methods are effective at solving numerical PDEs in linear complexity. In this work we present a multigrid-in-channels (MGIC) approach that tackles the quadratic growth of the number of parameters with respect to the number of channels in standard convolutional neural networks (CNNs). Indeed, lightweight CNNs can achieve comparable accuracy to standard CNNs with fewer parameters; however, the number of weights still scales quadratically with the CNN's width. Our MGIC architectures replace each CNN block with an MGIC counterpart that utilizes a hierarchy of nested grouped convolutions of small group size to address this. Hence, our proposed architectures scale linearly with respect to the network's width while retaining full coupling of the channels as in standard CNNs.Our extensive experiments on image classification, segmentation, and point cloud classification show that applying this strategy to different architectures reduces the number of parameters while obtaining similar or better accuracy.

Moshe Eliasof · Jonathan Ephrath · Lars Ruthotto · Eran Treister
-
[ OpenReview  link »

Many types of physics-informed neural network models have been proposed in recent years as approaches for learning solutions to differential equations. When a particular task requires solving a differential equation at multiple parameterizations, this requires either re-training the model, or expanding its representation capacity to include the parameterization -- both solution that increase its computational cost. We propose the HyperPINN, which uses hypernetworks to learn to generate neural networks that can solve a differential equation from a given parameterization. We demonstrate with experiments on both a PDE and an ODE that this type of model can lead to neural network solutions to differential equations that maintain a small size, even when learning a family of solutions over a parameter space.

· Fei Sha
-
[ OpenReview  link »

In many areas, such as the physical sciences, life sciences, and finance, control approaches are used to achieve a desired goal in complex dynamical systems governed by differential equations. In this work we formulate the problem of controlling stochastic partial differential equations (SPDE) as a reinforcement learning problem. We present a learning-based, distributed control approach for online control of a system of SPDEs with high dimensional state-action space using deep deterministic policy gradient method. We tested the performance of our method on the problem of controlling the stochastic Burgers’ equation, describing a turbulent fluid flow in an infinitely large domain.

Erfan Pirmorad · Farnam Mansouri · Amir-massoud Farahmand
-
[ OpenReview  link »

We introduce a recently developed framework PDE Acceleration, which is a variational approach to accelerated optimization with partial differential equations (PDE), in the context of optimization of deep networks. We derive the PDE evolution equations for optimization of general loss functions using this variational approach. We propose discretizations of these PDE based on numerical PDE discretizations, and establish a mapping between these discretizations and stochastic gradient descent (SGD). We show that our framework can give rise to new PDEs that can be mapped to new optimization algorithms, and thus theoretical insights from the PDE domain can be used to analyze optimization algorithms. We show an example by introducing a new PDE with diffusion that naturally arises from the viscosity solution, which translates to a novel extension of SGD. We analytically analyze the stability and convergence using Von-Neumann analysis. We apply the proposed extension to optimization of convolutional neural networks (CNNs). We empirically validate the theory and evaluate our new extension on image classification showing empirical improvement over SGD.

Yuxin Sun · Dong Lao · Ganesh Sundaramoorthi · Anthony Yezzi
-
[ OpenReview  link »

We present Shape-Tailored Deep Neural Networks (ST-DNN). ST-DNN are deep networks formulated through the use of partial differential equations (PDE) to be defined on arbitrarily shaped regions. This is natural for problems in computer vision such as segmentation, where descriptors should describe regions (e.g., of objects) that have diverse shape. We formulate ST-DNNs through the Poisson PDE, which can be used to generalize convolution to arbitrary regions. We stack multiple PDE layers to generalize a deep CNN to arbitrarily shaped regions. We show that ST-DNN are provably covariant to translations and rotations and robust to domain deformations, which are important properties for computer vision tasks. We show proof-of-concept empirical validation.

Naeemullah Khan · Angira Sharma · Philip Torr · Ganesh Sundaramoorthi
-
[ OpenReview  link »

Deep learning-based reduced order models (DL-ROMs) have been recently proposed to overcome common limitations shared by conventional ROMs - built, e.g., through proper orthogonal decomposition (POD) - when applied to nonlinear time-dependent parametrized PDEs. Although extremely efficient at testing time, when evaluating the PDE solution for any new testing-parameter instance, DL-ROMs require an expensive training stage. To avoid this latter, a prior dimensionality reduction through POD, and a multi-fidelity pretraining stage, are introduced, yielding the POD-DL-ROM framework, which allows to solve time-dependent PDEs even faster than in real-time. Equipped with LSTM networks, the resulting POD-LSTM-ROMs better grasp the time evolution of the PDE system, ultimately allowing long-term prediction of complex systems’ evolution, with respect to the training window, for unseen input parameter values.

Stefania Fresca · Federico Fatone · Andrea Manzoni
-
[ OpenReview  link »

High-fidelity large-eddy simulations (LES) of high Reynolds number flows are essential to design low-carbon footprint energy conversion devices. The two-level Taylor-Galerkin (TTGC) finite-element method (FEM) has remained the workhorse of modern industrial-scale combustion LES. In this work, we propose an improved FEM termed ML-TTGC that introduces locally tunable parameters in the TTGC scheme, whose values are provided by a graph neural network (GNN). We show that ML-TTGC outperforms TTGC in solving the convection problem in both irregular and regular meshes over a wide-range of initial conditions. We train the GNN using parameter values that (i) minimize a weighted loss function of the dispersion and dissipation error and (ii) enforce them to be numerically stable. As a result no additional ad-hoc dissipation is necessary for numerical stability or to damp spurious waves amortizing the additional cost of running the GNN.

Luciano DROZDA
-
[ OpenReview  link »

A range of applications require learning image generation models whose latent space effectively captures the high-level factors of variation in the data distribution, which can be judged by its ability to interpolate between images smoothly. However, most generative models mapping a fixed prior to the generated images lead to interpolation trajectories lacking smoothness and images of reduced quality. We propose a novel generative model that learns a flexible non-parametric prior over interpolation trajectories, conditioned on a pair of source and target images. Instead of relying on deterministic interpolation methods like linear or spherical interpolation in latent space, we devise a framework that learns a distribution of trajectories between two given images using Latent Second-Order Neural Ordinary Differential Equations. Through a hybrid combination of reconstruction and adversarial losses, the generator is trained to map the sampled points from these trajectories to sequences of realistic images of improved quality that smoothly transition from the source to the target image.

Avinandan Bose · Aniket Das · Yatin Dandi · Piyush Rai
-
[ OpenReview  link »

Neural networks can learn local interactions to faithfully reproduce large-scale dynamics in important physical systems. Trained on PDE integrations or noisy observations, these emulators can assimilate data, tune parameters and learn sub-grid process representations. However, implicit integration schemes cannot be expressed as local feedforward computations. We therefore introduce linear implicit layers (LILs), which learn and solve linear systems with locally computed coefficients. LILs use diagonal dominance to ensure parallel solver convergence and support efficient backward mode differentiation. As a challenging test case, we train emulators on semi-implicit integration of 2D shallow-water equations with closed boundaries. LIL networks learned compact representations of the local interactions controlling the 30.000 degrees of freedom of this discretized system of PDEs. This enabled accurate and stable LIL-based emulation over many time steps where feedforward networks failed.

Marcel Nonnenmacher · David Greenberg
-
[ OpenReview  link »

Measurement noise is an integral part while collecting data of physical processes. Thus, noise removal is necessary to draw conclusions from these data and is essential to construct dynamic models using these data. This work discusses a methodology for learning dynamic models using noisy measurements and simultaneously obtaining denoised data. In our methodology, the main innovation can be seen in integrating deep neural networks with a numerical integration method. Precisely, we aim at learning a neural network that implicitly represents the data and an additional neural network that models the vector fields of the dependent variables. We combine these two networks by enforcing the constraint that the data at the next time-step can be obtained by following a numerical integration scheme. The proposed framework to identify a model predicting the vector field is effective under noisy measurements and provides denoised data. We demonstrate the effectiveness of the proposed method to learn models using a differential equation and present a comparison with the neural ODE approach.

Pawan Goyal · Peter Benner
-
[ OpenReview  link »

Synthesizing optimal controllers for dynamical systems in practice involves solving real-time optimization problems with hard time constraints. These constraints restrict the class of numerical methods that can be applied; indeed, computationally expensive but accurate numerical routines often have to be replaced with fast and inaccurate methods, trading inference time for worse theoretical guarantees on solution accuracy. This paper proposes a novel methodology to accelerate numerical optimization of optimal control policies via hypersolvers, hybrids of a base solver and a neural network. In particular, we apply low–order explicit numerical methods for the ordinary differential equation (ODE) associated to the numerical optimal control problem, augmented with an additional parametric approximator trained to reduce local truncation errors introduced by the base solver. Given a target system to control, we first pre-train hypersolvers to approximate base solver residuals by sampling plausible control inputs. Then, we use the trained hypersolver to obtain fast and accurate solutions of the target system during optimization of the controller. The performance of our approach is evaluated in direct and model predictive optimal control settings, where we show consistent Pareto improvements in terms of solution accuracy and control performance.

Federico Berto · Stefano Massaroli · Michael Poli · Jinkyoo Park
-
[ OpenReview  link »

Neural differential equations (neural DEs) are yet to see success in its application as interpretable autoencoders/descriptors, where they directly model a population of signals with the learned vector field. In this manuscript, we show that there is a threshold to which these models capture the dynamics of a population of signals produced under the same monitoring protocol. This threshold is computed by taking the derivative at each time point and analyzing the variance of its dynamics. In addition, we show that this can be tackled by projecting a highly-variant population to a lower dynamically variant space, where the model is able to capture dynamics, and similarly project the modelled signal back to the original space.

David Calhas · Rui Henriques
-
[ OpenReview  link »
Machine learning methods have made substantial advances in various aspects of physics. In particular multiple deep-learning methods have emerged as efficient ways of numerically solving differential equations arising commonly in physics. DeepONets [1] are one of the most prominent ideas in this theme which entails an optimization over a space of inner-products of neural nets. In this work we study the training dynamics of DeepONets for solving the pendulum to bring to light some intriguing properties of it. We demonstrate that contrary to usual expectations, test error here has its first local minima at the interpolation threshold i.e when model size $\approx$ training data size. Secondly, as opposed to the average end-point error, the best test error over iterations has better dependence on model size, as in it shows only a very mild double-descent. Lastly, we show evidence that triple-descent [2] is unlikely to occur for DeepONets. [1] Lu Lu, Pengzhan Jin, and George Em Karniadakis. DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. 2020. arXiv:1910.03193 [cs.LG][2] Ben Adlam and Jeffrey Pennington. The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization. 2020. arXiv:2008.06786 [stat.ML].
Pulkit Gopalani · Anirbit Mukherjee
-
[ OpenReview  link »

Deep Neural Networks (DNNs) training can be difficult due to vanishing or exploding gradients during weight optimization through backpropagation. To address this problem, we propose a general class of Hamiltonian DNNs (H-DNNs) that stems from the discretization of continuous-time Hamiltonian systems. Our main result is that a broad set of H-DNNs ensures non-vanishing gradients by design for an arbitrary network depth. This is obtained by proving that, using a semi-implicit Euler discretization scheme, the backward sensitivity matrices involved in gradient computations are symplectic.

Clara Galimberti · Luca Furieri
-
[ OpenReview  link »

Traditionally, we provide technical parameters for ODE solvers, such as the order, the stepsize and the local error threshold. However, there is no for performance metrics that users care about, such as the time consumption and the global error. In this paper, we provide such a user-oriented by using neural networks to fit the complex relationship between the technical parameters and performance metrics. The form of the neural network is carefully designed to incorporate the prior knowledge from time complexity analysis of ODE solvers, which has better performance than purely data-driven approaches. We test our strategy on some parametrized ODE problems, and experimental results show that the fitted model can achieve high accuracy, thus providing error for fixed methods and time for adaptive stepsize methods.

Feng Zhao · Xiang Chen · Jun Wang · Zuoqiang Shi · Shao-Lun Huang
-
[ OpenReview  link »

Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamics-governed time-series, NODEs present a few disadvantages. First, they are unable to adapt to incoming data-points, a fundamental requirement for real-time applications imposed by the natural direction of time. Second, time-series are often composed of a sparse set of measurements, which could be explained by many possible underlying dynamics. NODEs do not capture this uncertainty. To this end, we introduce Neural ODE Processes (NDPs), a new class of stochastic processes determined by a distribution over Neural ODEs. By maintaining an adaptive data-dependent distribution over the underlying ODE, we show that our model can successfully capture the dynamics of low-dimensional systems from just a few data-points. At the same time, we demonstrate that NDPs scale up to challenging high-dimensional time-series with unknown latent dynamics such as rotating MNIST digits. Code is available online at https://github.com/crisbodnar/ndp.

Alexander Norcliffe · Cristian Bodnar · Ben Day · Jacob Moss · Pietro Lió
-
[ OpenReview  link »

In Norcliffe et al.[13], we discussed and systematically analysed how Neural ODEs (NODEs) can learn higher-order order dynamics. In particular, we focused on second-order dynamic behaviour and analysed Augmented NODEs (ANODEs), showing that they can learn second-order dynamics with only a few augmented dimensions, but are unable to correctly model the velocity (first derivative). In response, we proposed Second Order NODEs (SONODEs), that build on top of ANODEs, but explicitly take into account the second-order physics-based inductive biases. These biases, besides making them more efficient and noise-robust when modelling second-order dynamics, make them more interpretable than ANODEs, therefore more suitable in many real-world scientific modelling applications.

Alexander Norcliffe · Cristian Bodnar · Ben Day · Nikola Simidjievski · Pietro Lió
-
[ OpenReview  link »

In scientific machine learning, neural networks recently have become a popular tool for learning the solutions of differential equations.However, practical results often conflict the existing theoretical predictions in that observed convergence stagnates early. A substantial improvement can be achieved by the presented multilevel scheme which decomposes the considered problem into easier to train sub-problems, resulting in a sequence of neural networks. The efficacy of the approach is demonstrated for high-dimensional parametric elliptic PDEs that are common benchmark problems in uncertainty quantification. Moreover, a theoretical analysis of the expressivity of the developed neural networks is devised.

Cosmas Heiß · Ingo Gühring · Martin Eigel
-
[ OpenReview  link »

Backpropagation algorithm is indispensable for training modern residual networks (ResNets) and usually tends to be time-consuming due to its inherent algorithmic lockings. Auxiliary-variable methods, e.g., the penalty and augmented Lagrangian (AL) methods, have attracted much interest lately due to their ability to exploit layer5 wise parallelism. However, we find that large communication overhead and lacking data augmentation are two key challenges of these approaches, which may lead to low speedup and accuracy drop. Inspired by the continuous-time formulation of ResNets, we propose a novel serial-parallel hybrid (SPH) training strategy to enable the use of data augmentation during training, together with downsampling (DS) filters to reduce the communication cost. This strategy first trains the network by solving a succession of independent sub-problems in parallel and then improve the trained network through a full serial forward-backward propagation of data. We validate our methods on modern ResNets across benchmark datasets, achieving speedup over the backpropagation while maintaining comparable accuracy.

Qi Sun · Hexin Dong · Zewei Chen · WeiZhen Dian · Jiacheng Sun · Yitong Sun · Zhenguo Li · Bin Dong
-
[ OpenReview  link »

We frame the problem of learning stochastic differential equations (SDEs) from noisy observations as an inference problem and aim to maximize the marginal likelihood of the observations in a joint model of the latent paths and the noisy observations. As this problem is intractable, we derive an approximate (variational) inference algorithm and propose a novel parameterization of the approximate distribution over paths using a sparse Markovian Gaussian process. The approximation is efficient in storage and computation, allowing the usage of well-established optimizing algorithms such as natural gradient descent. We demonstrate the capability of the proposed method on the Ornstein-Uhlenbeck process.

Prakhar Verma · Vincent ADAM · Arno Solin
-
[ OpenReview  link »

In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and Physics-Informed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schr\"odinger equation on a hypercube with zero Dirichlet boundary condition, which is applied in quantum-mechanical systems. We establish upper and lower bounds for both methods, which improve upon concurrently developed upper bounds for this problem via a fast rate generalization bound. We discover that the current Deep Ritz Method is sub-optimal and propose a modified version of it. We also prove that PINN and the modified version of DRM can achieve minimax optimal bounds over Sobolev spaces. Empirically, following recent work which has shown that the deep model accuracy will improve with growing training sets according to a power law, we supply computational experiments to show similar-behavior of dimension dependent power law for deep PDE solvers.

Yiping Lu · Haoxuan Chen · Jianfeng Lu · Lexing Ying · Jose Blanchet
-
[ OpenReview  link »
We develop a machine learning model to effectively solve high-dimensional nonlinear parabolic partial differential equations (PDE). We use Feynman-Kac formula to reformulate PDE into the equivalent stochastic control problem governed by a Backward Stochastic Differential Equation (BSDE) system. Our model is designed to maximally exploit the Markovian property of the BSDE system and utilizes an Actor-Critic network architecture, which is novel in the high dimensional PDE literature. We show that our algorithm design leads to a significant speedup with higher accuracy level compared to other neural network solvers. Our model advances the state-of-the-art machine learning PDE solvers in a few aspects: 1) the trainable parameters are reduced by $N$ times, where $N$ is the number of steps to discretize the PDE in time, 2) the model convergence rate is an order of magnitude faster, 3) our model has fewer tuning hyperparameters. We demonstrate the performance improvements by solving six equations including Hamilton-Jacobian-Bellman equation, Allen-Cahn equation and Black-Scholes equation, all with dimensions on the order of 100. Those equations in high dimensions have wide applications in control theory, material science and Quantitative finance.
Xiaohan Zhang
-
[ OpenReview  link »

We consider the question whether the time evolution of controlled differential equations on general state spaces can be arbitrarily well approximated by (regularized) regressions on features generated themselves through randomly chosen dynamical systems of moderately high dimension. On the one hand this is motivated by paradigms of reservoir computing, on the other hand by ideas from rough path theory and compressed sensing. Appropriately interpreted this yields provable approximation and generalization results for generic dynamical systems by regressions on states of random, otherwise untrained dynamical systems, which usually are approximated by recurrent or LSTM networks. The results have important implications for transfer learning and energy efficiency of training.We apply methods from rough path theory, convenient analysis, non-commutative algebra and the Johnson-Lindenstrauss Lemma to prove the approximation results.

Lukas Gonon · Josef Teichmann
-
[ OpenReview  link »
Recent experiments have shown that deep networks can approximate solutions to high-dimensional PDEs, seemingly escaping the curse of dimensionality. However, questions regarding the theoretical basis for such approximations, including the required network size remain open. In this paper, we investigate the representational power of neural networks for approximating solutions to linear elliptic PDEs with Dirichlet boundary conditions. We prove that when a PDE's coefficients are representable by small neural networks, the parameters required to approximate its solution scale polynomially with the input dimension $d$ and proportionally to the parameter counts of the coefficient networks. To this we end, we develop a proof technique that simulates gradient descent (in an appropriate Hilbert space) by growing a neural network architecture whose iterates each participate as sub-networks in their (slightly larger) successors, and converge to the solution of the PDE. We bound the size of the solution showing a polynomial dependence on $d$ and no dependence on the volume of the domain.
Tanya Marwah · Zachary Lipton · Andrej Risteski
-
[ OpenReview  link »

Recently, physics-informed neural networks (PINNs) have offered a powerful new paradigm for solving forward and inverse problems relating to differential equations. Whilst promising, a key limitation to date is that PINNs struggle to accurately solve problems with large domains and/or multi-scale solutions, which is crucial for their real-world application. In this work we propose a new approach called finite basis physics-informed neural networks (FBPINNs). FBPINNs combine PINNs with domain decomposition and separate subdomain normalisation to address the issues related to scaling PINNs to large domains, namely the increasing complexity of the underlying optimisation problem and the spectral bias of neural networks. Our experiments show that FBPINNs are more effective than PINNs in solving problems with large domains and/or multi-scale solutions, potentially paving the way to the application of PINNs on large, real-world problems.

Ben Moseley · Andrew Markham
-
[ OpenReview  link »

We propose a novel class of graph neural networks based on the discretizedBeltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning, topology evolution. The resulting model generalizes many popular graph neural networks and achieves state-of-the-art results on several benchmarks.

Benjamin Chamberlain · · Francesco Di Giovanni · Davide Eynard · Xiaowen Dong · Michael Bronstein

Author Information

Luca Celotti (Université de Sherbrooke)
Kelly Buchanan (Columbia University)
Jorge Ortiz (Rutgers University)
Patrick Kidger (University of Oxford)
Stefano Massaroli (The University of Tokyo)
Michael Poli (Stanford University)

My work spans topics in deep learning, dynamical systems, variational inference and numerical methods. I am broadly interested in ensuring the successes achieved by deep learning methods in computer vision and natural language are extended to other engineering domains.

Lily Hu (Google Research)
Ermal Rrapaj (University of California, Berkeley)
Martin Magill (Ontario Tech University)
Thorsteinn Jonsson (University of Guelph)
Animesh Garg (University of Toronto, Nvidia, Vector Institute)

I am a CIFAR AI Chair Assistant Professor of Computer Science at the University of Toronto, a Faculty Member at the Vector Institute, and Sr. Researcher at Nvidia. My current research focuses on machine learning for perception and control in robotics.

More from the Same Authors