Deep learning can solve differential equations, and differential equations can model deep learning. What have we learned and where to next?
The focus of this workshop is on the interplay between deep learning (DL) and differential equations (DEs). In recent years, there has been a rapid increase of machine learning applications in computational sciences, with some of the most impressive results at the interface of DL and DEs. These successes have widespread implications, as DEs are among the most wellunderstood tools for the mathematical analysis of scientific knowledge, and they are fundamental building blocks for mathematical models in engineering, finance, and the natural sciences. This relationship is mutually beneficial. DL techniques have been used in a variety of ways to dramatically enhance the effectiveness of DE solvers and computer simulations. Conversely, DEs have also been used as mathematical models of the neural architectures and training algorithms arising in DL.
This workshop will aim to bring together researchers from each discipline to encourage intellectual exchanges and cultivate relationships between the two communities. The scope of the workshop will include important topics at the intersection of DL and DEs.
Tue 3:45 a.m.  4:00 a.m.

Introduction and opening remarks
(Introduction)
SlidesLive Video » 
🔗 
Tue 4:00 a.m.  4:45 a.m.

Weinan E  Machine Learning and PDEs
(Invited Talk)
link »
SlidesLive Video » 
Weinan E 🔗 
Tue 4:45 a.m.  5:00 a.m.

NeurIntLearning Interpolation by Neural ODEs
(Spotlight Talk)
SlidesLive Video » A range of applications require learning image generation models whose latent space effectively captures the highlevel factors of variation in the data distribution, which can be judged by its ability to interpolate between images smoothly. However, most generative models mapping a fixed prior to the generated images lead to interpolation trajectories lacking smoothness and images of reduced quality. We propose a novel generative model that learns a flexible nonparametric prior over interpolation trajectories, conditioned on a pair of source and target images. Instead of relying on deterministic interpolation methods like linear or spherical interpolation in latent space, we devise a framework that learns a distribution of trajectories between two given images using Latent SecondOrder Neural Ordinary Differential Equations. Through a hybrid combination of reconstruction and adversarial losses, the generator is trained to map the sampled points from these trajectories to sequences of realistic images of improved quality that smoothly transition from the source to the target image. 
🔗 
Tue 5:00 a.m.  5:15 a.m.

Neural ODE Processes: A Short Summary
(Spotlight Talk)
SlidesLive Video » Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamicsgoverned timeseries, NODEs present a few disadvantages. First, they are unable to adapt to incoming datapoints, a fundamental requirement for realtime applications imposed by the natural direction of time. Second, timeseries are often composed of a sparse set of measurements, which could be explained by many possible underlying dynamics. NODEs do not capture this uncertainty. To this end, we introduce Neural ODE Processes (NDPs), a new class of stochastic processes determined by a distribution over Neural ODEs. By maintaining an adaptive datadependent distribution over the underlying ODE, we show that our model can successfully capture the dynamics of lowdimensional systems from just a few datapoints. At the same time, we demonstrate that NDPs scale up to challenging highdimensional timeseries with unknown latent dynamics such as rotating MNIST digits. Code is available online at https://github.com/crisbodnar/ndp. 
🔗 
Tue 5:15 a.m.  6:00 a.m.

Neha Yadav  Deep learning methods for solving differential equations
(Invited Talk)
link »
SlidesLive Video » 
Neha Yadav 🔗 
Tue 6:00 a.m.  6:15 a.m.

Coffee Break
(Break)

🔗 
Tue 6:15 a.m.  6:30 a.m.

GRAND: Graph Neural Diffusion
(Spotlight Talk)
SlidesLive Video » We present Graph Neural Diffusion (GRAND), a model that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, achieving competitive results on many standard graph benchmarks. 
🔗 
Tue 6:30 a.m.  6:45 a.m.

Neural Solvers for Fast and Accurate Numerical Optimal Control
(Spotlight Talk)
SlidesLive Video » Synthesizing optimal controllers for dynamical systems in practice involves solving realtime optimization problems with hard time constraints. These constraints restrict the class of numerical methods that can be applied; indeed, computationally expensive but accurate numerical routines often have to be replaced with fast and inaccurate methods, trading inference time for worse theoretical guarantees on solution accuracy. This paper proposes a novel methodology to accelerate numerical optimization of optimal control policies via hypersolvers, hybrids of a base solver and a neural network. In particular, we apply low–order explicit numerical methods for the ordinary differential equation (ODE) associated to the numerical optimal control problem, augmented with an additional parametric approximator trained to reduce local truncation errors introduced by the base solver. Given a target system to control, we first pretrain hypersolvers to approximate base solver residuals by sampling plausible control inputs. Then, we use the trained hypersolver to obtain fast and accurate solutions of the target system during optimization of the controller. The performance of our approach is evaluated in direct and model predictive optimal control settings, where we show consistent Pareto improvements in terms of solution accuracy and control performance. 
🔗 
Tue 6:45 a.m.  7:30 a.m.

Poster Session 1
(Poster Session)
https://eventhosts.gather.town/app/pyTVekMlogztZr5d/dldeposterroom 
🔗 
Tue 6:45 a.m.  7:30 a.m.

GRAND: Graph Neural Diffusion
(Poster)
link »
We present Graph Neural Diffusion (GRAND), a model that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, achieving competitive results on many standard graph benchmarks. 
Benjamin Chamberlain · James Rowbottom · Maria Gorinova · Stefan Webb · Emanuele Rossi · Michael Bronstein 🔗 
Tue 6:45 a.m.  7:30 a.m.

Empirics on the expressiveness of Randomized Signature
(Poster)
link »
Time series analysis is a widespread task in Natural Sciences, Social Sciences and Engineering. A fundamental problem is finding an expressive yet efficienttocompute representation of the input time series to use as a starting point to perform arbitrary downstream tasks. In this paper, we build upon recent work using the signature of a path as a feature map and investigate a computationally efficient technique to approximate these features based on linear random projections. We present several theoretical results to justify our approach, we analyze and showcase its empirical performance on the task of learning a mapping between the input controls of a Stochastic Differential Equation (SDE) and its corresponding solution. Our results show that the representational power of the proposed random features allows to efficiently learn the aforementioned mapping. 
Enea Monzio Compagnoni · Luca Biggio · Antonio Orvieto 🔗 
Tue 6:45 a.m.  7:30 a.m.

Enhancing the trainability and expressivity of deep MLPs with globally orthogonal initialization
(Poster)
link »
Multilayer Perceptrons (MLPs) defines a fundamental model class that forms the backbone of many modern deep learning architectures. Despite their universality guarantees, practical training via stochastic gradient descent often struggles to attain theoretical error bounds due to issues including (but not limited to) frequency bias, vanishing gradients, and stiff gradient flows. In this work we postulate that many of such issues find origins in the initialization of the network's parameters. While the initialization schemes proposed by Glorot {\it et al.} and He {\it et al.} have become the defacto choices among practitioners, their goal to preserve the variance of forward and backwardpropagated signals is mainly achieved by assumptions on linearity, while the presence of nonlinear activation functions may partially destroy these efforts. Here, we revisit the initialization of MLPs from a dynamical systems viewpoint to explore why and how under these classical scheme, the MLP could still fail even at the beginning. Drawing inspiration from classical numerical methods for differential equations that leverage orthogonal feature representations, we propose a novel initialization scheme that promotes orthogonality in the features of the last hidden layer, ultimately leading to more diverse and localized features. Our results demonstrate that network initialization alone can be sufficient in mitigating frequency bias and yields competitive results for highfrequency function approximation and image regression tasks, without any additional modifications to the network architecture or activation functions. 
Hanwen Wang · Paris Perdikaris 🔗 
Tue 6:45 a.m.  7:30 a.m.

Gotta Go Fast with ScoreBased Generative Models
(Poster)
link »
Scorebased (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data to noise and generate data by reversing it. Unfortunately, current scorebased models generate data very slowly due to the sheer number of score network evaluations required by numerical SDE solvers. In this work, we aim to accelerate this process by devising a more efficient SDE solver. Our solver requires only two score function evaluations per step, rarely rejects samples, and leads to highquality samples. Our approach generates data 2 to 10 times faster than EM while achieving better or equal sample quality. For highresolution images, our method leads to significantly higher quality samples than all other methods tested. Our SDE solver has the benefit of requiring no step size tuning. 
Alexia JolicoeurMartineau · Ke Li · Rémi PichéTaillefer · Tal Kachman · Ioannis Mitliagkas 🔗 
Tue 6:45 a.m.  7:30 a.m.

Quantized convolutional neural networks through the lens of partial differential equations
(Poster)
link »
Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs. However, fixedpoint arithmetic is not natural to the type of computations involved in neural networks. In our work, we consider symmetric and stable variants of common CNNs for image classification, and Graph Convolutional Networks (GCNs) for graph nodeclassification. We demonstrate through several experiments that the property of forward stability preserves the action of a network under different quantization rates, allowing stable quantized networks to behave similarly to their nonquantized counterparts while using fewer parameters. We also find that at times, stability aids in improving accuracy. These properties are of particular interest for sensitive, resourceconstrained or realtime applications. 
Ido BenYair · Moshe Eliasof · Eran Treister 🔗 
Tue 6:45 a.m.  7:30 a.m.

MGIC: MultigridinChannels Neural Network Architectures
(Poster)
link »
Multigrid (MG) methods are effective at solving numerical PDEs in linear complexity. In this work we present a multigridinchannels (MGIC) approach that tackles the quadratic growth of the number of parameters with respect to the number of channels in standard convolutional neural networks (CNNs). Indeed, lightweight CNNs can achieve comparable accuracy to standard CNNs with fewer parameters; however, the number of weights still scales quadratically with the CNN's width. Our MGIC architectures replace each CNN block with an MGIC counterpart that utilizes a hierarchy of nested grouped convolutions of small group size to address this. Hence, our proposed architectures scale linearly with respect to the network's width while retaining full coupling of the channels as in standard CNNs.Our extensive experiments on image classification, segmentation, and point cloud classification show that applying this strategy to different architectures reduces the number of parameters while obtaining similar or better accuracy. 
Moshe Eliasof · Jonathan Ephrath · Lars Ruthotto · Eran Treister 🔗 
Tue 6:45 a.m.  7:30 a.m.

HyperPINN: Learning parameterized differential equations with physicsinformed hypernetworks
(Poster)
link »
Many types of physicsinformed neural network models have been proposed in recent years as approaches for learning solutions to differential equations. When a particular task requires solving a differential equation at multiple parameterizations, this requires either retraining the model, or expanding its representation capacity to include the parameterization  both solution that increase its computational cost. We propose the HyperPINN, which uses hypernetworks to learn to generate neural networks that can solve a differential equation from a given parameterization. We demonstrate with experiments on both a PDE and an ODE that this type of model can lead to neural network solutions to differential equations that maintain a small size, even when learning a family of solutions over a parameter space. 
· Fei Sha 🔗 
Tue 6:45 a.m.  7:30 a.m.

Learning Dynamics from Noisy Measurements using Deep Learning with a RungeKutta Constraint
(Poster)
link »
Measurement noise is an integral part while collecting data of physical processes. Thus, noise removal is necessary to draw conclusions from these data and is essential to construct dynamic models using these data. This work discusses a methodology for learning dynamic models using noisy measurements and simultaneously obtaining denoised data. In our methodology, the main innovation can be seen in integrating deep neural networks with a numerical integration method. Precisely, we aim at learning a neural network that implicitly represents the data and an additional neural network that models the vector fields of the dependent variables. We combine these two networks by enforcing the constraint that the data at the next timestep can be obtained by following a numerical integration scheme. The proposed framework to identify a model predicting the vector field is effective under noisy measurements and provides denoised data. We demonstrate the effectiveness of the proposed method to learn models using a differential equation and present a comparison with the neural ODE approach. 
Pawan Goyal · Peter Benner 🔗 
Tue 6:45 a.m.  7:30 a.m.

Investigating the Role of Overparameterization While Solving the Pendulum with DeepONets
(Poster)
link »
Machine learning methods have made substantial advances in various aspects of physics. In particular multiple deeplearning methods have emerged as efficient ways of numerically solving differential equations arising commonly in physics. DeepONets [1] are one of the most prominent ideas in this theme which entails an optimization over a space of innerproducts of neural nets. In this work we study the training dynamics of DeepONets for solving the pendulum to bring to light some intriguing properties of it. We demonstrate that contrary to usual expectations, test error here has its first local minima at the interpolation threshold i.e when model size $\approx$ training data size. Secondly, as opposed to the average endpoint error, the best test error over iterations has better dependence on model size, as in it shows only a very mild doubledescent. Lastly, we show evidence that tripledescent [2] is unlikely to occur for DeepONets. [1] Lu Lu, Pengzhan Jin, and George Em Karniadakis. DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. 2020. arXiv:1910.03193 [cs.LG][2] Ben Adlam and Jeffrey Pennington. The Neural Tangent Kernel in High Dimensions: Triple Descent and a MultiScale Theory of Generalization. 2020. arXiv:2008.06786 [stat.ML].

Pulkit Gopalani · Anirbit Mukherjee 🔗 
Tue 6:45 a.m.  7:30 a.m.

Non Vanishing Gradients for Arbitrarily Deep Neural Networks: a Hamiltonian System Approach
(Poster)
link »
Deep Neural Networks (DNNs) training can be difficult due to vanishing or exploding gradients during weight optimization through backpropagation. To address this problem, we propose a general class of Hamiltonian DNNs (HDNNs) that stems from the discretization of continuoustime Hamiltonian systems. Our main result is that a broad set of HDNNs ensures nonvanishing gradients by design for an arbitrary network depth. This is obtained by proving that, using a semiimplicit Euler discretization scheme, the backward sensitivity matrices involved in gradient computations are symplectic. 
Clara Galimberti · Luca Furieri 🔗 
Tue 6:45 a.m.  7:30 a.m.

PerformanceGuaranteed ODE Solvers with ComplexityInformed Neural Networks
(Poster)
link »
Traditionally, we provide technical parameters for ODE solvers, such as the order, the stepsize and the local error threshold. However, there is no for performance metrics that users care about, such as the time consumption and the global error. In this paper, we provide such a useroriented by using neural networks to fit the complex relationship between the technical parameters and performance metrics. The form of the neural network is carefully designed to incorporate the prior knowledge from time complexity analysis of ODE solvers, which has better performance than purely datadriven approaches. We test our strategy on some parametrized ODE problems, and experimental results show that the fitted model can achieve high accuracy, thus providing error for fixed methods and time for adaptive stepsize methods. 
Feng Zhao · Xiang Chen · Jun Wang · Zuoqiang Shi · ShaoLun Huang 🔗 
Tue 6:45 a.m.  7:30 a.m.

Neural ODE Processes: A Short Summary
(Poster)
link »
Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamicsgoverned timeseries, NODEs present a few disadvantages. First, they are unable to adapt to incoming datapoints, a fundamental requirement for realtime applications imposed by the natural direction of time. Second, timeseries are often composed of a sparse set of measurements, which could be explained by many possible underlying dynamics. NODEs do not capture this uncertainty. To this end, we introduce Neural ODE Processes (NDPs), a new class of stochastic processes determined by a distribution over Neural ODEs. By maintaining an adaptive datadependent distribution over the underlying ODE, we show that our model can successfully capture the dynamics of lowdimensional systems from just a few datapoints. At the same time, we demonstrate that NDPs scale up to challenging highdimensional timeseries with unknown latent dynamics such as rotating MNIST digits. Code is available online at https://github.com/crisbodnar/ndp. 
Alexander Norcliffe · Cristian Bodnar · Ben Day · Jacob Moss · Pietro Lió 🔗 
Tue 6:45 a.m.  7:30 a.m.

On Second Order Behaviour in Augmented Neural ODEs: A Short Summary
(Poster)
link »
In Norcliffe et al.[13], we discussed and systematically analysed how Neural ODEs (NODEs) can learn higherorder order dynamics. In particular, we focused on secondorder dynamic behaviour and analysed Augmented NODEs (ANODEs), showing that they can learn secondorder dynamics with only a few augmented dimensions, but are unable to correctly model the velocity (first derivative). In response, we proposed Second Order NODEs (SONODEs), that build on top of ANODEs, but explicitly take into account the secondorder physicsbased inductive biases. These biases, besides making them more efficient and noiserobust when modelling secondorder dynamics, make them more interpretable than ANODEs, therefore more suitable in many realworld scientific modelling applications. 
Alexander Norcliffe · Cristian Bodnar · Ben Day · Nikola Simidjievski · Pietro Lió 🔗 
Tue 6:45 a.m.  7:30 a.m.

LayerParallel Training of Residual Networks with Auxiliary Variables
(Poster)
link »
Backpropagation algorithm is indispensable for training modern residual networks (ResNets) and usually tends to be timeconsuming due to its inherent algorithmic lockings. Auxiliaryvariable methods, e.g., the penalty and augmented Lagrangian (AL) methods, have attracted much interest lately due to their ability to exploit layer5 wise parallelism. However, we find that large communication overhead and lacking data augmentation are two key challenges of these approaches, which may lead to low speedup and accuracy drop. Inspired by the continuoustime formulation of ResNets, we propose a novel serialparallel hybrid (SPH) training strategy to enable the use of data augmentation during training, together with downsampling (DS) filters to reduce the communication cost. This strategy first trains the network by solving a succession of independent subproblems in parallel and then improve the trained network through a full serial forwardbackward propagation of data. We validate our methods on modern ResNets across benchmark datasets, achieving speedup over the backpropagation while maintaining comparable accuracy. 
Qi Sun · Hexin Dong · Zewei Chen · WeiZhen Dian · Jiacheng Sun · Yitong Sun · Zhenguo Li · Bin Dong 🔗 
Tue 6:45 a.m.  7:30 a.m.

Statistical Numerical PDE : Fast Rate, Neural Scaling Law and When it’s Optimal
(Poster)
link »
In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and PhysicsInformed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schr\"odinger equation on a hypercube with zero Dirichlet boundary condition, which is applied in quantummechanical systems. We establish upper and lower bounds for both methods, which improve upon concurrently developed upper bounds for this problem via a fast rate generalization bound. We discover that the current Deep Ritz Method is suboptimal and propose a modified version of it. We also prove that PINN and the modified version of DRM can achieve minimax optimal bounds over Sobolev spaces. Empirically, following recent work which has shown that the deep model accuracy will improve with growing training sets according to a power law, we supply computational experiments to show similarbehavior of dimension dependent power law for deep PDE solvers. 
Yiping Lu · Haoxuan Chen · Jianfeng Lu · Lexing Ying · Jose Blanchet 🔗 
Tue 6:45 a.m.  7:30 a.m.

ActorCritic Algorithm for Highdimensional PDEs
(Poster)
link »
We develop a machine learning model to effectively solve highdimensional nonlinear parabolic partial differential equations (PDE). We use FeynmanKac formula to reformulate PDE into the equivalent stochastic control problem governed by a Backward Stochastic Differential Equation (BSDE) system. Our model is designed to maximally exploit the Markovian property of the BSDE system and utilizes an ActorCritic network architecture, which is novel in the high dimensional PDE literature. We show that our algorithm design leads to a significant speedup with higher accuracy level compared to other neural network solvers. Our model advances the stateoftheart machine learning PDE solvers in a few aspects: 1) the trainable parameters are reduced by $N$ times, where $N$ is the number of steps to discretize the PDE in time, 2) the model convergence rate is an order of magnitude faster, 3) our model has fewer tuning hyperparameters. We demonstrate the performance improvements by solving six equations including HamiltonJacobianBellman equation, AllenCahn equation and BlackScholes equation, all with dimensions on the order of 100. Those equations in high dimensions have wide applications in control theory, material science and Quantitative finance.

Xiaohan Zhang 🔗 
Tue 6:45 a.m.  7:30 a.m.

Scaling physicsinformed neural networks to large domains by using domain decomposition
(Poster)
link »
Recently, physicsinformed neural networks (PINNs) have offered a powerful new paradigm for solving forward and inverse problems relating to differential equations. Whilst promising, a key limitation to date is that PINNs struggle to accurately solve problems with large domains and/or multiscale solutions, which is crucial for their realworld application. In this work we propose a new approach called finite basis physicsinformed neural networks (FBPINNs). FBPINNs combine PINNs with domain decomposition and separate subdomain normalisation to address the issues related to scaling PINNs to large domains, namely the increasing complexity of the underlying optimisation problem and the spectral bias of neural networks. Our experiments show that FBPINNs are more effective than PINNs in solving problems with large domains and/or multiscale solutions, potentially paving the way to the application of PINNs on large, realworld problems. 
Ben Moseley · Andrew Markham 🔗 
Tue 7:30 a.m.  8:15 a.m.

Philipp Grohs  The TheorytoPractice Gap in Deep Learning (Invited Talk) link »  Philipp Grohs 🔗 
Tue 8:15 a.m.  10:45 a.m.

Lunch Break
(Break)

🔗 
Tue 10:45 a.m.  11:00 a.m.

Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations
(Spotlight Talk)
SlidesLive Video » In many areas, such as the physical sciences, life sciences, and finance, control approaches are used to achieve a desired goal in complex dynamical systems governed by differential equations. In this work we formulate the problem of controlling stochastic partial differential equations (SPDE) as a reinforcement learning problem. We present a learningbased, distributed control approach for online control of a system of SPDEs with high dimensional stateaction space using deep deterministic policy gradient method. We tested the performance of our method on the problem of controlling the stochastic Burgers’ equation, describing a turbulent fluid flow in an infinitely large domain. 
🔗 
Tue 11:00 a.m.  11:15 a.m.

Statistical Numerical PDE : Fast Rate, Neural Scaling Law and When it’s Optimal
(Spotlight Talk)
SlidesLive Video » In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and PhysicsInformed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schr\"odinger equation on a hypercube with zero Dirichlet boundary condition, which is applied in quantummechanical systems. We establish upper and lower bounds for both methods, which improve upon concurrently developed upper bounds for this problem via a fast rate generalization bound. We discover that the current Deep Ritz Method is suboptimal and propose a modified version of it. We also prove that PINN and the modified version of DRM can achieve minimax optimal bounds over Sobolev spaces. Empirically, following recent work which has shown that the deep model accuracy will improve with growing training sets according to a power law, we supply computational experiments to show similarbehavior of dimension dependent power law for deep PDE solvers. 
🔗 
Tue 11:15 a.m.  11:30 a.m.

Coffee Break
(Break)

🔗 
Tue 11:30 a.m.  12:15 p.m.

Poster Session 2
(Poster Session)
https://eventhosts.gather.town/pyTVekMlogztZr5d/dldeposterroom 
🔗 
Tue 11:30 a.m.  12:15 p.m.

Adversarial Sampling for Solving Differential Equations with Neural Networks
(Poster)
link »
Neural networkbased methods for solving differential equations have been gaining traction. They work by improving the differential equation residuals of a neural network on a sample of points in each iteration. However, most of them employ standard sampling schemes like uniform or perturbing equally spaced points. We present a novel sampling scheme which samples points adversarially to maximize the loss of the current solution estimate. A sampler architecture is described along with the loss terms used for training. Finally, we demonstrate that this scheme outperforms preexisting schemes by comparing both on a number of problems. 
Kshitij Parwani · Pavlos Protopapas 🔗 
Tue 11:30 a.m.  12:15 p.m.

Spectral PINNs: Fast Uncertainty Propagation with PhysicsInformed Neural Networks
(Poster)
link »
Physicsinformed neural networks (PINNs) promise to significantly speed up partial differential equation (PDE) solvers. However, most PINNs can only solve deterministic PDEs. Here, we consider \textit{stochastic} PDEs that contain partially unknown parameters. We aim to quickly quantify the impact of uncertain parameters onto the solution of a PDE  that is  we want to perform fast uncertainty propagation. Classical uncertainty propagation methods such as Monte Carlo sampling, stochastic Galerkin, collocation, or discrete projection methods become computationally too expensive with an increasing number of stochastic parameters. For example, the wellknown spectral or polynomial chaos expansions achieve to separate the spatiotemporal and probabilistic domains and offer theoretical guarantees and fast computation of stochastic summaries (e.g., mean), but can be computationally expensive to form. Our SpectralPINNs approximate the underlying spectral coefficients with a neural network and reduce the computational cost of the spectral expansion while maintaining guarantees. We derive the method for partial differential equations, discuss runtime, demonstrate initial results on the convectiondiffusion equation, and provide steps towards convergence guarantees. 
Björn Lütjens · Mark Veillette · Dava Newman 🔗 
Tue 11:30 a.m.  12:15 p.m.

Uncertainty Quantification in Neural Differential Equations
(Poster)
link »
Uncertainty quantification (UQ) helps to make trustworthy predictions based on collected observations and uncertain domain knowledge. With increased usage of deep learning in various applications, the need for efficient UQ methods that can make deep models more reliable has increased as well. Among applications that can benefit from effective handling of uncertainty are the deep learning based differential equation (DE) solvers. We adapt several stateoftheart UQ methods to get the predictive uncertainty for DE solutions and show the results on four different DE types. 
Olga Graf · Pablo Flores · Pavlos Protopapas 🔗 
Tue 11:30 a.m.  12:15 p.m.

Multigridaugmented deep learning preconditioners for the Helmholtz equation
(Poster)
link »
We present a datadriven approach to iteratively solve the discrete heterogeneous Helmholtz equation at high wavenumbers. We combine multigrid ingredients with convolutional neural networks (CNNs) to form a preconditioner which is applied within a Krylov solver. Two types of preconditioners are proposed 1) UNet as a coarse grid solver, and 2) UNet as a deflation operator with shifted Laplacian Vcycles. The resulting CNN preconditioner can generalize over residuals and a relatively general set of wave slowness models. On top of that, we offer an encodersolver framework where an 
Yael Azulay · Eran Treister 🔗 
Tue 11:30 a.m.  12:15 p.m.

Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations
(Poster)
link »
In many areas, such as the physical sciences, life sciences, and finance, control approaches are used to achieve a desired goal in complex dynamical systems governed by differential equations. In this work we formulate the problem of controlling stochastic partial differential equations (SPDE) as a reinforcement learning problem. We present a learningbased, distributed control approach for online control of a system of SPDEs with high dimensional stateaction space using deep deterministic policy gradient method. We tested the performance of our method on the problem of controlling the stochastic Burgers’ equation, describing a turbulent fluid flow in an infinitely large domain. 
Erfan Pirmorad · Farnam Mansouri · Amirmassoud Farahmand 🔗 
Tue 11:30 a.m.  12:15 p.m.

Accelerated PDEs for Construction and Theoretical Analysis of an SGD Extension
(Poster)
link »
We introduce a recently developed framework PDE Acceleration, which is a variational approach to accelerated optimization with partial differential equations (PDE), in the context of optimization of deep networks. We derive the PDE evolution equations for optimization of general loss functions using this variational approach. We propose discretizations of these PDE based on numerical PDE discretizations, and establish a mapping between these discretizations and stochastic gradient descent (SGD). We show that our framework can give rise to new PDEs that can be mapped to new optimization algorithms, and thus theoretical insights from the PDE domain can be used to analyze optimization algorithms. We show an example by introducing a new PDE with diffusion that naturally arises from the viscosity solution, which translates to a novel extension of SGD. We analytically analyze the stability and convergence using VonNeumann analysis. We apply the proposed extension to optimization of convolutional neural networks (CNNs). We empirically validate the theory and evaluate our new extension on image classification showing empirical improvement over SGD. 
Yuxin Sun · Dong Lao · Ganesh Sundaramoorthi · Anthony Yezzi 🔗 
Tue 11:30 a.m.  12:15 p.m.

ShapeTailored Deep Neural Networks With PDEs
(Poster)
link »
We present ShapeTailored Deep Neural Networks (STDNN). STDNN are deep networks formulated through the use of partial differential equations (PDE) to be defined on arbitrarily shaped regions. This is natural for problems in computer vision such as segmentation, where descriptors should describe regions (e.g., of objects) that have diverse shape. We formulate STDNNs through the Poisson PDE, which can be used to generalize convolution to arbitrary regions. We stack multiple PDE layers to generalize a deep CNN to arbitrarily shaped regions. We show that STDNN are provably covariant to translations and rotations and robust to domain deformations, which are important properties for computer vision tasks. We show proofofconcept empirical validation. 
Naeemullah Khan · Angira Sharma · Philip Torr · Ganesh Sundaramoorthi 🔗 
Tue 11:30 a.m.  12:15 p.m.

Longtime prediction of nonlinear parametrized dynamical systems by deep learningbased ROMs
(Poster)
link »
Deep learningbased reduced order models (DLROMs) have been recently proposed to overcome common limitations shared by conventional ROMs  built, e.g., through proper orthogonal decomposition (POD)  when applied to nonlinear timedependent parametrized PDEs. Although extremely efficient at testing time, when evaluating the PDE solution for any new testingparameter instance, DLROMs require an expensive training stage. To avoid this latter, a prior dimensionality reduction through POD, and a multifidelity pretraining stage, are introduced, yielding the PODDLROM framework, which allows to solve timedependent PDEs even faster than in realtime. Equipped with LSTM networks, the resulting PODLSTMROMs better grasp the time evolution of the PDE system, ultimately allowing longterm prediction of complex systems’ evolution, with respect to the training window, for unseen input parameter values. 
Stefania Fresca · Federico Fatone · Andrea Manzoni 🔗 
Tue 11:30 a.m.  12:15 p.m.

Datadriven TaylorGalerkin finiteelement scheme for convection problems
(Poster)
link »
Highfidelity largeeddy simulations (LES) of high Reynolds number flows are essential to design lowcarbon footprint energy conversion devices. The twolevel TaylorGalerkin (TTGC) finiteelement method (FEM) has remained the workhorse of modern industrialscale combustion LES. In this work, we propose an improved FEM termed MLTTGC that introduces locally tunable parameters in the TTGC scheme, whose values are provided by a graph neural network (GNN). We show that MLTTGC outperforms TTGC in solving the convection problem in both irregular and regular meshes over a widerange of initial conditions. We train the GNN using parameter values that (i) minimize a weighted loss function of the dispersion and dissipation error and (ii) enforce them to be numerically stable. As a result no additional adhoc dissipation is necessary for numerical stability or to damp spurious waves amortizing the additional cost of running the GNN. 
Luciano DROZDA 🔗 
Tue 11:30 a.m.  12:15 p.m.

NeurIntLearning Interpolation by Neural ODEs
(Poster)
link »
A range of applications require learning image generation models whose latent space effectively captures the highlevel factors of variation in the data distribution, which can be judged by its ability to interpolate between images smoothly. However, most generative models mapping a fixed prior to the generated images lead to interpolation trajectories lacking smoothness and images of reduced quality. We propose a novel generative model that learns a flexible nonparametric prior over interpolation trajectories, conditioned on a pair of source and target images. Instead of relying on deterministic interpolation methods like linear or spherical interpolation in latent space, we devise a framework that learns a distribution of trajectories between two given images using Latent SecondOrder Neural Ordinary Differential Equations. Through a hybrid combination of reconstruction and adversarial losses, the generator is trained to map the sampled points from these trajectories to sequences of realistic images of improved quality that smoothly transition from the source to the target image. 
Avinandan Bose · Aniket Das · Yatin Dandi · Piyush Rai 🔗 
Tue 11:30 a.m.  12:15 p.m.

Learning Implicit PDE Integration with Linear Implicit Layers
(Poster)
link »
Neural networks can learn local interactions to faithfully reproduce largescale dynamics in important physical systems. Trained on PDE integrations or noisy observations, these emulators can assimilate data, tune parameters and learn subgrid process representations. However, implicit integration schemes cannot be expressed as local feedforward computations. We therefore introduce linear implicit layers (LILs), which learn and solve linear systems with locally computed coefficients. LILs use diagonal dominance to ensure parallel solver convergence and support efficient backward mode differentiation. As a challenging test case, we train emulators on semiimplicit integration of 2D shallowwater equations with closed boundaries. LIL networks learned compact representations of the local interactions controlling the 30.000 degrees of freedom of this discretized system of PDEs. This enabled accurate and stable LILbased emulation over many time steps where feedforward networks failed. 
Marcel Nonnenmacher · David Greenberg 🔗 
Tue 11:30 a.m.  12:15 p.m.

Neural Solvers for Fast and Accurate Numerical Optimal Control
(Poster)
link »
Synthesizing optimal controllers for dynamical systems in practice involves solving realtime optimization problems with hard time constraints. These constraints restrict the class of numerical methods that can be applied; indeed, computationally expensive but accurate numerical routines often have to be replaced with fast and inaccurate methods, trading inference time for worse theoretical guarantees on solution accuracy. This paper proposes a novel methodology to accelerate numerical optimization of optimal control policies via hypersolvers, hybrids of a base solver and a neural network. In particular, we apply low–order explicit numerical methods for the ordinary differential equation (ODE) associated to the numerical optimal control problem, augmented with an additional parametric approximator trained to reduce local truncation errors introduced by the base solver. Given a target system to control, we first pretrain hypersolvers to approximate base solver residuals by sampling plausible control inputs. Then, we use the trained hypersolver to obtain fast and accurate solutions of the target system during optimization of the controller. The performance of our approach is evaluated in direct and model predictive optimal control settings, where we show consistent Pareto improvements in terms of solution accuracy and control performance. 
Federico Berto · Stefano Massaroli · Michael Poli · Jinkyoo Park 🔗 
Tue 11:30 a.m.  12:15 p.m.

Fitting Regularized Population Dynamics with Neural Differential Equations
(Poster)
link »
Neural differential equations (neural DEs) are yet to see success in its application as interpretable autoencoders/descriptors, where they directly model a population of signals with the learned vector field. In this manuscript, we show that there is a threshold to which these models capture the dynamics of a population of signals produced under the same monitoring protocol. This threshold is computed by taking the derivative at each time point and analyzing the variance of its dynamics. In addition, we show that this can be tackled by projecting a highlyvariant population to a lower dynamically variant space, where the model is able to capture dynamics, and similarly project the modelled signal back to the original space. 
David Calhas · Rui Henriques 🔗 
Tue 11:30 a.m.  12:15 p.m.

A neural multilevel method for highdimensional parametric PDEs
(Poster)
link »
In scientific machine learning, neural networks recently have become a popular tool for learning the solutions of differential equations.However, practical results often conflict the existing theoretical predictions in that observed convergence stagnates early. A substantial improvement can be achieved by the presented multilevel scheme which decomposes the considered problem into easier to train subproblems, resulting in a sequence of neural networks. The efficacy of the approach is demonstrated for highdimensional parametric elliptic PDEs that are common benchmark problems in uncertainty quantification. Moreover, a theoretical analysis of the expressivity of the developed neural networks is devised. 
Cosmas Heiß · Ingo Gühring · Martin Eigel 🔗 
Tue 11:30 a.m.  12:15 p.m.

Sparse Gaussian Processes for Stochastic Differential Equations
(Poster)
link »
We frame the problem of learning stochastic differential equations (SDEs) from noisy observations as an inference problem and aim to maximize the marginal likelihood of the observations in a joint model of the latent paths and the noisy observations. As this problem is intractable, we derive an approximate (variational) inference algorithm and propose a novel parameterization of the approximate distribution over paths using a sparse Markovian Gaussian process. The approximation is efficient in storage and computation, allowing the usage of wellestablished optimizing algorithms such as natural gradient descent. We demonstrate the capability of the proposed method on the OrnsteinUhlenbeck process. 
Prakhar Verma · Vincent ADAM · Arno Solin 🔗 
Tue 11:30 a.m.  12:15 p.m.

Expressive Power of Randomized Signature
(Poster)
link »
We consider the question whether the time evolution of controlled differential equations on general state spaces can be arbitrarily well approximated by (regularized) regressions on features generated themselves through randomly chosen dynamical systems of moderately high dimension. On the one hand this is motivated by paradigms of reservoir computing, on the other hand by ideas from rough path theory and compressed sensing. Appropriately interpreted this yields provable approximation and generalization results for generic dynamical systems by regressions on states of random, otherwise untrained dynamical systems, which usually are approximated by recurrent or LSTM networks. The results have important implications for transfer learning and energy efficiency of training.We apply methods from rough path theory, convenient analysis, noncommutative algebra and the JohnsonLindenstrauss Lemma to prove the approximation results. 
Lukas Gonon · Josef Teichmann 🔗 
Tue 12:15 p.m.  1:00 p.m.

Anima Anandkumar  Neural operator: A new paradigm for learning PDEs
(Invited Talk)
link »
SlidesLive Video » 
Animashree Anandkumar 🔗 
Tue 1:00 p.m.  1:15 p.m.

HyperPINN: Learning parameterized differential equations with physicsinformed hypernetworks
(Spotlight Talk)
SlidesLive Video » Many types of physicsinformed neural network models have been proposed in recent years as approaches for learning solutions to differential equations. When a particular task requires solving a differential equation at multiple parameterizations, this requires either retraining the model, or expanding its representation capacity to include the parameterization  both solution that increase its computational cost. We propose the HyperPINN, which uses hypernetworks to learn to generate neural networks that can solve a differential equation from a given parameterization. We demonstrate with experiments on both a PDE and an ODE that this type of model can lead to neural network solutions to differential equations that maintain a small size, even when learning a family of solutions over a parameter space. 
🔗 
Tue 1:15 p.m.  1:30 p.m.

Learning Implicit PDE Integration with Linear Implicit Layers
(Spotlight Talk)
SlidesLive Video » Neural networks can learn local interactions to faithfully reproduce largescale dynamics in important physical systems. Trained on PDE integrations or noisy observations, these emulators can assimilate data, tune parameters and learn subgrid process representations. However, implicit integration schemes cannot be expressed as local feedforward computations. We therefore introduce linear implicit layers (LILs), which learn and solve linear systems with locally computed coefficients. LILs use diagonal dominance to ensure parallel solver convergence and support efficient backward mode differentiation. As a challenging test case, we train emulators on semiimplicit integration of 2D shallowwater equations with closed boundaries. LIL networks learned compact representations of the local interactions controlling the 30.000 degrees of freedom of this discretized system of PDEs. This enabled accurate and stable LILbased emulation over many time steps where feedforward networks failed. 
🔗 
Tue 8:00 p.m.  8:58 p.m.

Solving Differential Equations with Deep Learning: State of the Art and Future Directions
(Panel Discussion)

🔗 
Tue 8:58 p.m.  8:59 p.m.

Final Remarks

🔗 