Workshop
The Symbiosis of Deep Learning and Differential Equations II
Michael Poli · Winnie Xu · Estefany Kelly Buchanan · Maryam Hosseini · Luca Celotti · Martin Magill · Ermal Rrapaj · Qiyao Wei · Stefano Massaroli · Patrick Kidger · Archis Joglekar · Animesh Garg · David Duvenaud
Virtual
Fri 9 Dec, 4 a.m. PST
In recent years, there has been a rapid increase of machine learning applications in computational sciences, with some of the most impressive results at the interface of deep learning (DL) and differential equations (DEs). DL techniques have been used in a variety of ways to dramatically enhance the effectiveness of DE solvers and computer simulations. These successes have widespread implications, as DEs are among the most wellunderstood tools for the mathematical analysis of scientific knowledge, and they are fundamental building blocks for mathematical models in engineering, finance, and the natural sciences. Conversely, DL algorithms based on DEssuch as neural differential equations and continuoustime diffusion modelshave also been successfully employed as deep learning models. Moreover, theoretical tools from DE analysis have been used to glean insights into the expressivity and training dynamics of mainstream deep learning algorithms.
This workshop will aim to bring together researchers with backgrounds in computational science and deep learning to encourage intellectual exchanges, cultivate relationships and accelerate research in this area. The scope of the workshop spans topics at the intersection of DL and DEs, including theory of DL and DEs, neural differential equations, solving DEs with neural networks, and more.
Schedule
Fri 4:00 a.m.  4:10 a.m.

Introduction and opening remarks
(
Opening remarks
)
>
SlidesLive Video 
🔗 
Fri 4:10 a.m.  4:25 a.m.

Provable Active Learning of Neural Networks for Parametric PDEs
(
Spotlight
)
>
link
Neural networks have proven effective in constructing surrogate models for parametric partial differential equations (PDEs) and for approximating highdimensional quantity of interest (QoI) surfaces. A major cost is training such models is collecting training data, which requires solving the target PDE for a variety of different parameter settings. Active learning and experimental design methods have the potential to reduce this cost, but are not yet widely used for training neural networks, nor do there exist methods with strong theoretical foundations. In this work we provide evidence, both empirical and theoretical, that existing active sampling techniques can be used successfully for fitting neural network models for highdimensional parameteric PDEs. In particular, we show the effectiveness of ``coherence motivated'' sampling methods (i.e., leverage score sampling), which are widely used to fit PDE surrogate models based on polynomials. We prove that leverage score sampling yields strong theoretical guarantees for fitting single neuron models, even under adversarial label noise. Our theoretical bounds apply to any single neuron model with a Lipschitz nonlinearity (ReLU, sigmoid, absolute value, lowdegree polynomial, etc.). 
Aarshvi Gajjar · Chinmay Hegde · Christopher Musco 🔗 
Fri 4:25 a.m.  4:40 a.m.

PIXEL: PhysicsInformed Cell Representations for Fast and Accurate PDE Solvers
(
Spotlight
)
>
link
SlidesLive Video Physicsinformed neural networks (PINNs) have recently emerged and succeeded in various PDEs problems with their meshfree properties, flexibility, and unsupervised training. However, their slower convergence speed and relatively inaccurate solutions often limit their broader applicability. This paper proposes a new kind of datadriven PDEs solver, physicsinformed cell representations (PIXEL), elegantly combining classical numerical methods and learningbased approaches. We adopt a grid structure from the numerical methods to improve accuracy and convergence speed and overcome the spectral bias presented in PINNs. Moreover, the proposed method enjoys the same benefits in PINNs, e.g., using the same optimization frameworks to solve both forward and inverse PDE problems and readily enforcing PDE constraints with modern automatic differentiation techniques. The various challenging PDE experiments show that the original PINNs have struggled and that PIXEL achieves fast convergence speed and high accuracy. 
Namgyu Kang · Byeonghyeon Lee · Youngjoon Hong · SeokBae Yun · Eunbyung Park 🔗 
Fri 4:40 a.m.  4:55 a.m.

Bridging the Gap Between Coulomb GAN and Gradientregularized WGAN
(
Spotlight
)
>
link
SlidesLive Video Generative adversarial networks (GANs) are essentially a minmax game between the discriminator and a generator. Coulomb GANs have a closely related formulation where the generator minimizes the potential difference between real (negative) and fake (positive) charge densities, wherein the discriminator approximates a lowdimensional Plummer kernel centered around the samples. Motivated by links between electrostatic potential theory and the Poisson partial differential equation (PDE), we consider the underlying functional optimization in Coulomb GAN and show that the associated discriminator is the optimum of a firstorder gradientregularized Wasserstein GAN (WGAN) cost. Subsequently, we show that, within the regularized WGAN setting, the optimal discriminator is the Green's function to the Poisson PDE, which corresponds to the Coulomb potential. As an alternative to training a discriminator in either WGAN or Coulomb GAN, we demonstrate, by means of synthetic data experiments, that the closedform implementation of the optimal discriminator leads to a superior performance of the GAN generator. 
Siddarth Asokan · Chandra Seelamantula 🔗 
Fri 4:55 a.m.  5:10 a.m.

How PINNs cheat: Predicting chaotic motion of a double pendulum
(
Spotlight
)
>
link
SlidesLive Video Despite extensive research, physicsinformed neural networks (PINNs) are still difficult to train, especially when the optimization relies heavily on the physics loss term. Convergence problems frequently occur when simulating dynamical systems with highfrequency components, chaotic or turbulent behavior. In this work, we discuss whether the traditional PINN framework is able to predict chaotic motion by conducting experiments on the undamped double pendulum. Our results demonstrate that PINNs do not exhibit any sensitivity to perturbations in the initial condition. Instead, the PINN optimization consistently converges to physically correct solutions that violate the initial condition only marginally, but diverge significantly from the desired solution due to the chaotic nature of the system. In fact, the PINN predictions primarily exhibit lowfrequency components with a smaller magnitude of higherorder derivatives, which favors lower physics loss values compared to the desired solution. We thus hypothesize that the PINNs "cheat" by shifting the initial conditions to values that correspond to physically correct solutions that are easier to learn. Initial experiments suggest that domain decomposition combined with an appropriate loss weighting scheme mitigates this effect and allows convergence to the desired solution. 
Sophie Steger · Franz M. Rohrhofer · Bernhard Geiger 🔗 
Fri 5:10 a.m.  6:05 a.m.

Poster Session 1
(
Poster Session 1
)
>

🔗 
Fri 6:05 a.m.  6:50 a.m.

Keynote Talk 1
(
Keynote Talk 1
)
>
SlidesLive Video 
Yang Song 🔗 
Fri 6:50 a.m.  7:05 a.m.

Blind Drifting: Diffusion models with a linear SDE drift term for blind image restoration tasks
(
Spotlight
)
>
link
SlidesLive Video In this work, we utilize the highfidelity generation abilities of diffusion models to solve blind image restoration tasks, using JPEG artifact removal at high compression levels as an example. We propose a simple modification of the forward stochastic differential equation (SDE) of diffusion models to adapt them to such tasks. Comparing our approach against a regression baseline with the same network architecture, we show that our approach can escape the baseline's tendency to generate blurry images and recovers the distribution of clean images significantly more faithfully, while also only requiring a dataset of clean/corrupted image pairs and no knowledge about the corruption operation. By utilizing the idea that the distributions of clean and corrupted images are much closer to each other than to a Gaussian prior, our approach requires only low levels of added noise, and thus needs comparatively few sampling steps even without further optimizations. 
Simon Welker · Henry Chapman · Timo Gerkmann 🔗 
Fri 7:05 a.m.  8:05 a.m.

Break
(
Break
)
>

🔗 
Fri 8:05 a.m.  8:50 a.m.

Keynote Talk 2
(
Keynote Talk 2
)
>

Rose Yu 🔗 
Fri 8:50 a.m.  9:05 a.m.

A Universal Abstraction for Hierarchical Hopfield Networks
(
Spotlight
)
>
link
SlidesLive Video Conceptualized as Associative Memory, Hopfield Networks (HNs) are powerful models which describe neural network dynamics converging to a local minimum of an energy function. HNs are conventionally described by a neural network with two layers connected by a matrix of synaptic weights. However, it is not well known that the Hopfield framework generalizes to systems in which many neuron layers and synapses work together as a unified Hierarchical Associative Memory (HAM) model: a single network described by memory retrieval dynamics (convergence to a fixed point) and governed by a global energy function. In this work we introduce a universal abstraction for HAMs using the building blocks of neuron layers (nodes) and synapses (edges) connected within a hypergraph. We implement this abstraction as a software framework, written in JAX, whose autograd feature removes the need to derive update rules for the complicated energybased dynamics. Our framework, called HAMUX (HAM User eXperience), enables anyone to build and train hierarchical HNs using familiar operations like convolutions and attention alongside activation functions like Softmaxes, ReLUs, and LayerNorms. HAMUX is a powerful tool to study HNs at scale, something that has never been possible before. We believe that HAMUX lays the groundwork for a new type of AI framework built around dynamical systems and energybased associative memories. 
Benjamin Hoover · Duen Horng Chau · Hendrik Strobelt · Dmitry Krotov 🔗 
Fri 9:05 a.m.  10:00 a.m.

Poster Session 2
(
Poster Session 2
)
>

🔗 
Fri 10:00 a.m.  10:45 a.m.

Keynote Talk 3
(
Keynote Talk 3
)
>

Christopher Rackauckas 🔗 
Fri 10:45 a.m.  10:55 a.m.

Closing remarks
(
Closing remarks
)
>
SlidesLive Video 
🔗 


On the impact of larger batch size in the training of Physics Informed Neural Networks
(
Poster
)
>
link
Physics Informed Neural Networks (PINNs) have demonstrated remarkable success in learning complex physical processes such as shocks and turbulence, but their applicability has been limited due to long training times. In this work, we explore the potential of large batch size training to save training time and improve final accuracy in PINNs. We show that conclusions about generalization gap brought by large batch size training on image classification tasks may not be compatible with PINNs. We conclude that larger batch sizes always beneficial to training PINNs. 
Shyam Sankaran · Hanwen Wang · Leonardo Ferreira Guilhoto · Paris Perdikaris 🔗 


PDEGCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations
(
Poster
)
>
link
Graph neural networks are have shown their efficacy in fields such as computer vision, computational biology and chemistry, where data are naturally explained by graphs. However, unlike convolutional neural networks, deep graph networks do not necessarily yield better performance than shallow networks. This behaviour usually stems from the oversmoothing phenomenon. In this work, we propose a family of architecturesto control this behaviour by design. Our networks are motivated by numerical methods for solving Partial Differential Equations (PDEs) on manifolds, and as such, their behaviour can be explained by similar analysis. 
Moshe Eliasof · Eldad Haber · Eran Treister 🔗 


A Neural ODE Interpretation of Transformer Layers
(
Poster
)
>
link
Transformer layers, which use an alternating pattern of multihead attention and multilayer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multihead attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance. 
Yaofeng Zhong · Tongtao Zhang · Amit Chakraborty · Biswadip Dey 🔗 


Provable Active Learning of Neural Networks for Parametric PDEs
(
Poster
)
>
link
Neural networks have proven effective in constructing surrogate models for parametric partial differential equations (PDEs) and for approximating highdimensional quantity of interest (QoI) surfaces. A major cost is training such models is collecting training data, which requires solving the target PDE for a variety of different parameter settings. Active learning and experimental design methods have the potential to reduce this cost, but are not yet widely used for training neural networks, nor do there exist methods with strong theoretical foundations. In this work we provide evidence, both empirical and theoretical, that existing active sampling techniques can be used successfully for fitting neural network models for highdimensional parameteric PDEs. In particular, we show the effectiveness of ``coherence motivated'' sampling methods (i.e., leverage score sampling), which are widely used to fit PDE surrogate models based on polynomials. We prove that leverage score sampling yields strong theoretical guarantees for fitting single neuron models, even under adversarial label noise. Our theoretical bounds apply to any single neuron model with a Lipschitz nonlinearity (ReLU, sigmoid, absolute value, lowdegree polynomial, etc.). 
Aarshvi Gajjar · Chinmay Hegde · Christopher Musco 🔗 


PIXEL: PhysicsInformed Cell Representations for Fast and Accurate PDE Solvers
(
Poster
)
>
link
Physicsinformed neural networks (PINNs) have recently emerged and succeeded in various PDEs problems with their meshfree properties, flexibility, and unsupervised training. However, their slower convergence speed and relatively inaccurate solutions often limit their broader applicability. This paper proposes a new kind of datadriven PDEs solver, physicsinformed cell representations (PIXEL), elegantly combining classical numerical methods and learningbased approaches. We adopt a grid structure from the numerical methods to improve accuracy and convergence speed and overcome the spectral bias presented in PINNs. Moreover, the proposed method enjoys the same benefits in PINNs, e.g., using the same optimization frameworks to solve both forward and inverse PDE problems and readily enforcing PDE constraints with modern automatic differentiation techniques. The various challenging PDE experiments show that the original PINNs have struggled and that PIXEL achieves fast convergence speed and high accuracy. 
Namgyu Kang · Byeonghyeon Lee · Youngjoon Hong · SeokBae Yun · Eunbyung Park 🔗 


Separable PINN: Mitigating the Curse of Dimensionality in PhysicsInformed Neural Networks
(
Poster
)
>
link
Physicsinformed neural networks (PINNs) have emerged as new datadriven PDE solvers for both forward and inverse problems. While promising, the expensive computational costs to obtain solutions often restrict their broader applicability. We demonstrate that the computations in automatic differentiation (AD) can be significantly reduced by leveraging forwardmode AD when training PINN. However, a naive application of forwardmode AD to conventional PINNs results in higher computation, losing its practical benefit. Therefore, we propose a network architecture, called separable PINN (SPINN), which can facilitate forwardmode AD for more efficient computation. SPINN operates on a peraxis basis instead of pointwise processing in conventional PINNs, decreasing the number of network forward passes. Besides, while the computation and memory costs of standard PINNs grow exponentially along with the grid resolution, that of our model is remarkably less susceptible, mitigating the curse of dimensionality. We demonstrate the effectiveness of our model in various highdimensional PDE systems. Given the same number of training points, we reduced the computational cost by $1,195\times$ in FLOPs and achieved $57\times$ speedup in wallclock training time on commodity GPUs while achieving higher accuracy.

Junwoo Cho · Seungtae Nam · Hyunmo Yang · SeokBae Yun · Youngjoon Hong · Eunbyung Park 🔗 


LiFenet: Datadriven Modelling of Timedependent Temperatures and Charging Statistics Of Tesla’s LiFePo4 EV Battery
(
Poster
)
>
link
Modelling the temperature of Electric Vehicle (EV) batteries is a fundamental task of EV manufacturing. Extreme temperatures in the battery packs can affect their longevity and power output. Although theoretical models exist for describing heat transfer in battery packs, they are computationally expensive to simulate. Furthermore, it is difficult to acquire data measurements from within the battery cell. In this work, we propose a datadriven surrogate model (LiFenet) that uses readily accessible driving diagnostics for battery temperature estimation to overcome these limitations. This model incorporates Neural Operators with a traditional numerical integration scheme to estimate the temperature evolution. Moreover, we propose two further variations of the baseline model: LiFenet trained with a regulariser and LiFenet trained with time stability loss. We compared these models in terms of generalization error on test data. The results showed that LiFenet trained with time stability loss outperforms the other two models and can estimate the temperature evolution on unseen data with a relative error of 2.77 \% on average. 
Jeyhun Rustamov · Luisa Fennert · Nico Hoffmann 🔗 


Neural Latent Dynamics Models
(
Poster
)
>
link
We introduce Neural Latent Dynamics Models (NLDMs), a neural ordinary differential equations (ODEs)based architecture to perform blackbox nonlinear latent dynamics discovery, without the need to include any inductive bias related to either the underlying physical model or the latent coordinates space. The effectiveness of this strategy is experimentally tested in the framework of reduced order modeling, considering a set of problems involving highdimensional data generated from nonlinear timedependent parameterized partial differential equations (PDEs) simulations, where we aim at performing extrapolation in time, to forecast the PDE solution out of the time interval and/or the parameter range where training data were acquired. Results highlight NLDMs' capabilities to perform lowdimensional latent dynamics learning in three different scenarios. 
Nicola Farenga · Stefania Fresca · Andrea Manzoni 🔗 


Optimal Control of PDEs Using PhysicsInformed Neural Networks
(
Poster
)
>
link
Physicsinformed neural networks (PINNs) have recently become a popular method for solving forward and inverse problems governed by partial differential equations (PDEs). By incorporating the residual of the PDE into the loss function of a neural networkbased surrogate model for the unknown state, PINNs can seamlessly blend measurement data with physical constraints. Here, we extend this framework to PDEconstrained optimal control problems, for which the governing PDE is fully known and the goal is to find a control variable that minimizes a desired cost objective. Importantly, we validate the performance of the PINN framework by comparing it to stateoftheart adjointbased optimization, which performs gradient descent on the discretized control variable while satisfying the discretized PDE. This comparison, carried out on challenging problems based on the nonlinear KuramotoSivashinsky and NavierStokes equations, sheds light on the pros and cons of the PINN and adjointbased approaches for solving PDEconstrained optimal control problems. 
Saviz Mowlavi · Saleh Nabi 🔗 


Physics Informed Symbolic Networks
(
Poster
)
>
link
We introduce Physics Informed Symbolic Networks (PISN) which utilize physicsinformed loss to obtain a symbolic solution for a system of Partial Differential Equations (PDE). Given a contextfree grammar to describe the language of symbolic expressions, we propose to use weighted sum as continuous approximation for selection of a production rule. We use this approximation to define multilayer symbolic networks. We consider Kovasznay flow (NavierStokes) and twodimensional viscous Burger’s equations to illustrate that PISN are able to provide a performance comparable to PINNs across various startoftheart advances: multiple outputs and governing equations, domaindecomposition, hypernetworks. Furthermore, we propose Physicsinformed Neurosymbolic Networks (PINSN) which employ a multilayer perceptron (MLP) operator to model the residue of symbolic networks. PINSNs are observed to give 23 orders of performance gain over standard PINN. 
Ritam Majumdar · Vishal Jadhav · Anirudh Deodhar · Shirish Karande · Lovekesh Vig · Venkataramana Runkana 🔗 


Evaluating Error Bound for PhysicsInformed Neural Networks on Linear Dynamical Systems
(
Poster
)
>
link
There have been extensive studies on solving differential equations using physicsinformed neural networks. While this method has proven advantageous in many cases, a major criticism lies in its lack of analytical error bounds. Therefore, it is less credible than its traditional counterparts, such as the finite difference method. This paper shows that one can mathematically derive explicit error bounds for physicsinformed neural networks trained on a class of linear dynamical systems using only the network's residuals (pointwise loss) over the domain. Our work shows a link between network residuals and the absolute error of solution. Our approach is semiphenomonological and independent of knowledge of the actual solution or the complexity or architecture of the network. Using the method of manufactured solution on linear ODEs and system of linear ODEs, we empirically verify the error evaluation algorithm and demonstrate that the actual error strictly lies within our derived bound. 
Shuheng Liu · Xiyue Huang · Pavlos Protopapas 🔗 


Learning flows of control systems
(
Poster
)
>
link
A recurrent neural network architecture is presented to learn the flow of a causal and timeinvariant control system from data.For piecewise constant control inputs, we show that the proposed architecture is able to approximate the flow function by exploiting the system's causality and timeinvariance.The output of the learned flow function can be queried at any time instant.We demonstrate the generalisation capabilities of the trained model with respect to the simulation time horizon and the class of input signals. 
Miguel Aguiar · Amritam Das · Karl H. Johansson 🔗 


Bridging the Gap Between Coulomb GAN and Gradientregularized WGAN
(
Poster
)
>
link
Generative adversarial networks (GANs) are essentially a minmax game between the discriminator and a generator. Coulomb GANs have a closely related formulation where the generator minimizes the potential difference between real (negative) and fake (positive) charge densities, wherein the discriminator approximates a lowdimensional Plummer kernel centered around the samples. Motivated by links between electrostatic potential theory and the Poisson partial differential equation (PDE), we consider the underlying functional optimization in Coulomb GAN and show that the associated discriminator is the optimum of a firstorder gradientregularized Wasserstein GAN (WGAN) cost. Subsequently, we show that, within the regularized WGAN setting, the optimal discriminator is the Green's function to the Poisson PDE, which corresponds to the Coulomb potential. As an alternative to training a discriminator in either WGAN or Coulomb GAN, we demonstrate, by means of synthetic data experiments, that the closedform implementation of the optimal discriminator leads to a superior performance of the GAN generator. 
Siddarth Asokan · Chandra Seelamantula 🔗 


Efficient Robustness Verification of Neural Ordinary Differential Equations
(
Poster
)
>
link
Neural Ordinary Differential Equations (NODEs) are a novel neural architecture, built around initial value problems with learned dynamics. Thought to be inherently more robust against adversarial perturbations, they were recently shown to be vulnerable to strong adversarial attacks, highlighting the need for formal guarantees. In this work, we tackle this challenge and propose GAINS, an analysis framework for NODEs based on three key ideas: (i) a novel class of ODE solvers, based on variable but discrete time steps, (ii) an efficient graph representation of solver trajectories, and (iii) a bound propagation algorithm operating on this graph representation. Together, these advances enable the efficient analysis and certified training of highdimensional NODEs, which we demonstrate in an extensive evaluation on computer vision and timeseries forecasting problems. 
Mustafa Zeqiri · Mark Müller · Marc Fischer · Martin Vechev 🔗 


Solving Singular Liouville Equations Using Deep Learning
(
Poster
)
>
link
Deep learning has been applied to solving highdimensional PDEs and successfully breaks the curse of dimensionality. However, it has barely been applied to finding singular solutions to certain PDEs, whose boundary conditions are absent and singular behavior is not a priori known. In this paper, we treat one example of such equations, the singular Liouville equations, which naturally arise when studying the celebrated Einstein equation in general relativity by using deep learning. We introduce a method of jointly training multiple deep neural networks to dynamically learn the singular behaviors of the solution and successfully capture both the smooth and singular parts of such equations. 
Yuxiang Ji 🔗 


How PINNs cheat: Predicting chaotic motion of a double pendulum
(
Poster
)
>
link
Despite extensive research, physicsinformed neural networks (PINNs) are still difficult to train, especially when the optimization relies heavily on the physics loss term. Convergence problems frequently occur when simulating dynamical systems with highfrequency components, chaotic or turbulent behavior. In this work, we discuss whether the traditional PINN framework is able to predict chaotic motion by conducting experiments on the undamped double pendulum. Our results demonstrate that PINNs do not exhibit any sensitivity to perturbations in the initial condition. Instead, the PINN optimization consistently converges to physically correct solutions that violate the initial condition only marginally, but diverge significantly from the desired solution due to the chaotic nature of the system. In fact, the PINN predictions primarily exhibit lowfrequency components with a smaller magnitude of higherorder derivatives, which favors lower physics loss values compared to the desired solution. We thus hypothesize that the PINNs "cheat" by shifting the initial conditions to values that correspond to physically correct solutions that are easier to learn. Initial experiments suggest that domain decomposition combined with an appropriate loss weighting scheme mitigates this effect and allows convergence to the desired solution. 
Sophie Steger · Franz M. Rohrhofer · Bernhard Geiger 🔗 


Structure preserving neural networks based on ODEs
(
Poster
)
>
link
Neural networks have gained much interest because of their effectiveness in many applications. However, their mathematical properties are generally not well understood. In the presence of some underlying geometric structure in the data or in the function to approximate, it is often desirable to consider this in the design of the neural network. In this work, we start with a nonautonomous ODE and build neural networks using a suitable, structurepreserving, numerical timediscretisation. The structure of the neural network is then inferred from the properties of the ODE vector field. To support the flexibility of the approach, we go through the derivation of volumepreserving, masspreserving and Lipschitz constrained neural networks. Finally, a masspreserving network is applied to the problem of approximating the dynamics of a conservative dynamical system. On the other hand, a Lipschitz constrained network is demonstrated to provide improved adversarial robustness to a CIFAR10 classifier. 
Davide Murari · Elena Celledoni · Brynjulf Owren · CarolaBibiane Schönlieb · Ferdia Sherry 🔗 


Blind Drifting: Diffusion models with a linear SDE drift term for blind image restoration tasks
(
Poster
)
>
link
In this work, we utilize the highfidelity generation abilities of diffusion models to solve blind image restoration tasks, using JPEG artifact removal at high compression levels as an example. We propose a simple modification of the forward stochastic differential equation (SDE) of diffusion models to adapt them to such tasks. Comparing our approach against a regression baseline with the same network architecture, we show that our approach can escape the baseline's tendency to generate blurry images and recovers the distribution of clean images significantly more faithfully, while also only requiring a dataset of clean/corrupted image pairs and no knowledge about the corruption operation. By utilizing the idea that the distributions of clean and corrupted images are much closer to each other than to a Gaussian prior, our approach requires only low levels of added noise, and thus needs comparatively few sampling steps even without further optimizations. 
Simon Welker · Henry Chapman · Timo Gerkmann 🔗 


Learned 1D advection solver to accelerate air quality modeling
(
Poster
)
>
link
Accelerating the numerical integration of partial differential equations by learned surrogate model is a promising area of inquiry in the field of air pollution modeling. Most previous efforts in this field have been made on learned chemical operators though machinelearned fluid dynamics has been a more blooming area in machine learning community. Here we show the first trial on accelerating advection operator in the domain of air quality model using a realistic wind velocity dataset. We designed a convolutional neural networkbased solver giving coefficients to integrate the advection equation. We generated a training dataset using a 2nd order Van Leer type scheme with the 10day eastwest components of wind data on 39$^{\circ}$N within North America. The trained model with coarsegraining showed good accuracy overall, but instability occurred in a few cases. Our approach achieved up to 12.5$\times$ acceleration. The learned schemes also showed fair results in generalization tests.

Manho Park · Zhonghua Zheng · Nicole Riemer · Christopher Tessum 🔗 


Learning Ordinary Differential Equations with the Line Integral Loss Function
(
Poster
)
>
link
A new training method for learning representations of dynamical systems with neural networks is derived using a loss function based on line integrals from vector calculus. The new training method is shown to learn the direction part of an ODE vector field with more accuracy and faster convergence compared to traditional methods. The learned direction can then be combined with another model that learns the magnitude explicitly to decouple the learning process of an ODE into two separate easier problems. It can also be used as a feature generator for timeseries classification problems, performing well on motion classification of dynamical systems. The new method does however have multiple limitations that overall make the method less generalizable and only suited for some specific type of problems. 
Albert Johannessen 🔗 


A PINN Approach to Symbolic Differential Operator Discovery with Sparse Data
(
Poster
)
>
link
Given ample experimental data from a system governed by differential equations, it is possible to use deep learning techniques to construct the underlying differential operators. In this work we perform symbolic discovery of differential operators in a situation where there is sparse experimental data. This small data regime in machine learning can be made tractable by providing our algorithms with prior information about the underlying dynamics. Physics Informed Neural Networks (PINNs) have been very successful in this regime (reconstructing entire ODE solutions using only a single point or entire PDE solutions with very few measurements of the initial condition). We modify the PINN approach by adding a neural network that learns a representation of unknown hidden terms in the differential equation. The algorithm yields both a surrogate solution to the differential equation and a blackbox representation of the hidden terms. These hidden term neural networks can then be converted into symbolic equations using symbolic regression techniques like AI Feynman. In order to achieve convergence of these neural networks, we provide our algorithms with (noisy) measurements of both the initial condition as well as (synthetic) experimental data obtained at later times. We demonstrate strong performance of this approach even when provided with very few measurements of noisy data in both the ODE and PDE regime. 
Brydon Eastman · Lena Podina · Mohammad Kohandel 🔗 


A Universal Abstraction for Hierarchical Hopfield Networks
(
Poster
)
>
link
Conceptualized as Associative Memory, Hopfield Networks (HNs) are powerful models which describe neural network dynamics converging to a local minimum of an energy function. HNs are conventionally described by a neural network with two layers connected by a matrix of synaptic weights. However, it is not well known that the Hopfield framework generalizes to systems in which many neuron layers and synapses work together as a unified Hierarchical Associative Memory (HAM) model: a single network described by memory retrieval dynamics (convergence to a fixed point) and governed by a global energy function. In this work we introduce a universal abstraction for HAMs using the building blocks of neuron layers (nodes) and synapses (edges) connected within a hypergraph. We implement this abstraction as a software framework, written in JAX, whose autograd feature removes the need to derive update rules for the complicated energybased dynamics. Our framework, called HAMUX (HAM User eXperience), enables anyone to build and train hierarchical HNs using familiar operations like convolutions and attention alongside activation functions like Softmaxes, ReLUs, and LayerNorms. HAMUX is a powerful tool to study HNs at scale, something that has never been possible before. We believe that HAMUX lays the groundwork for a new type of AI framework built around dynamical systems and energybased associative memories. 
Benjamin Hoover · Duen Horng Chau · Hendrik Strobelt · Dmitry Krotov 🔗 


Modular Flows: Differential Molecular Generation
(
Poster
)
>
link
Generating new molecules is fundamental to advancing critical applications such as drug discovery and material synthesis. Flows can generate molecules effectively by inverting the encoding process, however, existing flow models either require artifactual dequantization or specific node/edge orderings, lack desiderata such as permutation invariance or induce discrepancy between encoding and decoding steps that necessitates post hoc validity correction. We circumvent these issues with novel continuous normalizing E(3)equivariant flows, based on a system of node ODEs coupled as a graph PDE, that repeatedly reconcile locally toward globally aligned densities. Our models can be cast as message passing temporal networks, and result in superlative performance on the tasks of density estimation and molecular generation. In particular, our generated samples achieve state of the art on both the standard QM9 and ZINC250K benchmarks. 
Yogesh Verma · Samuel Kaski · Markus Heinonen · Vikas Garg 🔗 


Turning Normalizing Flows into Monge Maps with Geodesic Gaussian Preserving Flows
(
Poster
)
>
link
Normalizing Flows (NF) are powerful likelihoodbased generative models that are able to trade off between expressivity and tractability to model complex densities. A now well established research avenue leverages optimal transport (OT) and looks for Monge maps, i.e. models with minimal effort between the source and target distributions. This paper introduces a method based on Brenier's polar factorization theorem to transform any trained NF into a more OTefficient version without changing the final density. We do so by learning a rearrangement of the source (Gaussian) distribution that minimizes the OT cost between the source and the final density. We further constrain the path leading to the estimated Monge map to lie on a geodesic in the space of volumepreserving diffeomorphisms thanks to Euler's equations. The proposed method leads to smooth flows with reduced OT cost for several existing models without affecting the model performance. 
Guillaume Morel · Lucas Drumetz · Nicolas Courty · François Rousseau · Simon Benaïchouche 🔗 


Numerical integrators for learning dynamical systems from noisy data
(
Poster
)
>
link
Decades of research have been spent on classifying the properties of numerical integrators when solving ordinary differential equations (ODEs). Here, a first step is taken to examine the properties of numerical integrators when used to learn dynamical systems from noisy data with neural networks. Monoimplicit RungeKutta (MIRK) methods are a class of integrators that can be considered explicit for inverse problems. The symplectic property is useful when learning the dynamics of Hamiltonian systems. Unfortunately, a proof shows that symplectic MIRK methods have a maximum order of $p=2$. By taking advantage of the inverse explicit property, a novel integration method called the mean inverse integrator, tailored for solving inverse problems with noisy data, is introduced. As verified in numerical experiments on different dynamical systems, this method is less sensitive to noise in the data.

Håkon Noren · Sølve Eidnes · Elena Celledoni 🔗 


Experimental study of Neural ODE training with adaptive solver for dynamical systems modeling
(
Poster
)
>
link
Neural Ordinary Differential Equations (ODEs) was recently introduced as a new family of neural network models, which relies on blackbox ODE solvers for inference and training. Some ODE solvers called adaptive can adapt their evaluation strategy depending on thecomplexity of the problem at hand, opening great perspectives in machine learning. However, this paper describes a simple set of experiments to show why adaptive solvers cannot be seamlessly leveraged as a blackbox for dynamical systems modelling. By takingthe Lorenz'63 system as a showcase, we show that a naive application of the Fehlberg's method does not yield the expected results. Moreover, a simple workaround is proposed that assumes a tighter interaction between the solver and the training strategy. 
Alexandre Allauzen · Thiago Petrilli Maffei Dardis · Hannah De Oliveira Plath 🔗 


Hamiltonian Neural Koopman Operator
(
Poster
)
>
link
Recently, physicsinformed learning, a class of deep learning framework that incorporates the physics priors and the observational noiseperturbed data into the neural network models, has shown outstanding performances in learning physical principles with higher accuracy, faster training speed, and better generalization ability. Here, for the Hamiltonian mechanics and using the Koopman operator theory, we propose a typical physicsinformed learning framework, named as \textbf{H}amiltonian \textbf{N}eural \textbf{K}oopman \textbf{O}perator (HNKO) to learn the corresponding Koopman operator automatically satisfying the conservation laws. We analytically investigate the dimension of the manifold induced by the orthogonal transformation, and use a modified autoencoder to identify the nonlinear coordinate transformation that is required for approximating the Koopman operator. Taking the Kepler problem as an example, we demonstrate that the proposed HNKO in robustly learning the Hamiltonian dynamics outperforms the representative methods developed in the literature. Our results suggest that feeding the prior knowledge of the underlying system and the mathematical theory appropriately to the learning framework can reinforce the capability of the deep learning. 
Jingdong Zhang · Qunxi Zhu · Wei LIN 🔗 


torchode: A Parallel ODE Solver for PyTorch
(
Poster
)
>
link
We introduce an ODE solver for the PyTorch ecosystem that can solve multiple ODEs in parallel independently from each other while achieving significant performance gains. Our implementation tracks each ODE’s progress separately and is carefully optimized for GPUs and compatibility with PyTorch’s JIT compiler. Its design lets researchers easily augment any aspect of the solver and collect and analyze internal solver statistics. In our experiments, our implementation is up to 4.4 times faster per step than other ODE solvers and it is robust against withinbatch interactions that lead other solvers to take up to 4 times as many steps. 
Marten Lienen · Stephan Günnemann 🔗 