Timezone: »

Differentiable Programming Workshop
Ludger Paehler · William Moses · Maria I Gorinova · Assefaw H. Gebremedhin · Jan Hueckelheim · Sri Hari Krishna Narayanan

Mon Dec 13 06:00 AM -- 03:00 PM (PST) @ None
Event URL: https://diffprogramming.mit.edu »

Differentiable programming allows for automatically computing derivatives of functions within a high-level language. It has become increasingly popular within the machine learning (ML) community: differentiable programming has been used within backpropagation of neural networks, probabilistic programming, and Bayesian inference. Fundamentally, differentiable programming frameworks empower machine learning and its applications: the availability of efficient and composable automatic differentiation (AD) tools has led to advances in optimization, differentiable simulators, engineering, and science.

While AD tools have greatly increased the productivity of ML scientists and practitioners, many problems remain unsolved. Crucially, there is little communication between the broad group of AD users, the programming languages researchers, and the differentiable programming developers, resulting in them working in isolation. We propose a Differentiable Programming workshop as a forum to narrow the gaps between differentiable and probabilistic languages design, efficient automatic differentiation engines and higher-level applications of differentiable programming. We hope this workshop will harness a closer collaboration between language designers and domain scientists by bringing together a diverse part of the differentiable programming community including people working on core automatic differentiation tools, higher level frameworks that rely upon AD (such as probabilistic programming and differentiable simulators), and applications that use differentiable programs to solve scientific problems.

The explicit goals of the workshop are to:
1. Foster closer collaboration and synergies between the individual communities;
2. Evaluate the merits of differentiable design constructs and the impact they have on the algorithm design space and usability of the language;
3. Highlight differentiable techniques of individual domains, and the potential they hold for other fields.

Mon 6:00 a.m. - 6:05 a.m.
Welcome (Short Introduction & Welcome to the Workshop)
Mon 6:05 a.m. - 6:35 a.m.
Parallel-Friendly Automatic Differentiation in Dex and JAX (Invited Talk)
Adam Paszke
Mon 6:35 a.m. - 7:05 a.m.
SYMPAIS: SYMbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis (Invited Talk)
Yuan Zhou
Mon 7:05 a.m. - 7:20 a.m.
(Oral) [ OpenReview  link »   

In Computational Science, Engineering and Finance (CSEF)scripts typically serve as the ``glue'' between potentially highlycomplex and computationally expensive external subprograms.Differentiability of the resulting programs turns out to beessential in the context of derivative-based methods for error analysis, uncertainty quantification, optimization or training of surrogates.We argue that it shouldbe enforced by the scripting languageitself through exclusive support of differentiable (smoothed) externalsubprograms and differentiable intrinsics combined withprohibition of nondifferentiable branches in the data flow.Illustration is provided by a prototype adjoint code compiler for asimple Python-like scripting language.

Uwe Naumann
Mon 7:20 a.m. - 7:35 a.m.
(Oral) [ OpenReview  link »   

Differentiable simulators are an emerging concept with applications in several fields, from reinforcement learning to optimal control. Their distinguishing feature is the ability to calculate analytic gradients with respect to the input parameters. Like neural networks, which are constructed by composing several building blocks called layers, a simulation often requires computing the output of an operator that can itself be decomposed into elementary units chained together. While each layer of a neural network represents a specific discrete operation, the same operator can have multiple representations, depending on the discretization employed and the research question that needs to be addressed. Here, we propose a simple design pattern to construct a library of differentiable operators and discretizations, by representing operators as mappings between families of continuous functions, parametrized by finite vectors. We demonstrate the approach on an acoustic optimization problem, where the Helmholtz equation is discretized using Fourier spectral methods, and differentiability is demonstrated using gradient descent to optimize the speed of sound of an acoustic lens.

Antonio Stanziola · Simon Arridge
Mon 7:35 a.m. - 7:50 a.m.
Mon 7:50 a.m. - 8:20 a.m.
Differentiable Programming in Molecular Physics (Invited Talk)
Frank Noe
Mon 8:20 a.m. - 8:50 a.m.
Diffractor.jl: High Level, High Performance AD for Julia (Invited Talk)
Keno Fischer
Mon 8:50 a.m. - 9:05 a.m.
(Oral) [ OpenReview  link »   

JAX and PyTorch are two popular Python autodifferentiation frameworks. JAX is based around pure functions and functional programming. PyTorch has popularised the use of an object-oriented (OO) class-based syntax for defining parameterised functions, such as neural networks. That this seems like a fundamental difference means current libraries for building parameterised functions in JAX have either rejected the OO approach entirely (Stax) or have introduced OO-to-functional transformations, multiple new abstractions, and been limited in the extent to which they integrate with JAX (Flax, Haiku, Objax). Either way this OO/functional difference has been a source of tension. Here, we introduce Equinox', a small neural network library showing how a PyTorch-like class-based approach may be admitted without sacrificing JAX-like functional programming. We provide two main ideas. One: parameterised functions are themselves represented asPyTrees', which means that the parameterisation of a function is transparent to the JAX framework. Two: we filter a PyTree to isolate just those components that should be treated when transforming (jit',grad' or `vmap'-ing) a higher-order function of a parameterised function -- such as a loss function applied to a model. Overall Equinox resolves the above tension without introducing any new programmatic abstractions: only PyTrees and transformations, just as with regular JAX. Equinox is available at [REDACTED].

Patrick Kidger
Mon 9:05 a.m. - 9:20 a.m.
(Oral) [ OpenReview  link »

Fluid flows are omnipresent in nature and engineering disciplines.The reliable computation of fluids has been a long-lasting challenge due to nonlinear interactions over multiple spatio-temporal scales.The compressible Navier-Stokes equations govern compressible flows and allow for complex phenomena like turbulence and shocks.Despite tremendous progress in hardware and software, capturing the smallest length-scales in fluid flows still introduces prohibitive computational cost for real-life applications.We are currently witnessing a paradigm shift towards machine learning supported design of numerical schemes as a means to tackle aforementioned problem.While prior work has explored differentiable algorithms for one- or two-dimensional incompressible fluid flows, we present a fully-differentiable framework for the computation of compressible fluid flows using high-order state-of-the-art numerical methods.Firstly, we demonstrate the efficiency of our solver by computing classical two- and three-dimensional test cases, including strong shocks and transition to turbulence.Secondly, and more importantly, our framework allows for end-to-end optimization to improve existing numerical schemes inside computational fluid dynamics algorithms.In particular, we are using neural networks to substitute a conventional numerical flux function.

Deniz A Bezgin
Mon 9:20 a.m. - 9:25 a.m.
Short Break (Break)
Mon 9:25 a.m. - 10:40 a.m.
Poster Session
Mon 9:25 a.m. - 10:40 a.m.
Extended Abstract – Enzyme.jl: Low levelauto-differentiation meets high-level language (Poster) [ OpenReview [ Visit Poster at Spot B2 in Virtual World ]  link » Valentin Churavy
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot B1 in Virtual World ]  link »

Automatic Differentiation (AD) is a fundamental method that empowers computational algorithms across a range of fields, including Machine Learning, Robotics and High Energy Physics. We present methods enabling well-behaved C++ functions to be automatically differentiated on a GPU without need of code modification. This work brings forth the potential of a new layer of optimisation and a proportional speed up when gradients. The aim of this effort is to provide a tool for AD that can be easily integrated into existing frameworks as a compiler plugin extending the Clang compiler. It can be used interactively, as a Jupyter kernel extension, or as a plugin extending an interactive environment. It will provide researchers with the means to reuse pre-existing models and have their workloads scheduled on parallel processors without the need to optimise their computational kernels.

Vassil Vassilev · David Lange
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot B0 in Virtual World ]  link »

It is well-known that the reparametrisation gradient estimator for non-differentiable models is biased. To formalise the problem, we consider a variant of the simply-typed lambda calculus which supports the reparametrisation of arguments. We endow this language with a denotational semantics based on the cartesian closed category of Frölicher spaces (parameterised by a smoothing accuracy), which generalise smooth manifolds. Finally, we apply the standard reparametrisation gradient to the smoothed model and show that by enhancing the accuracy of the smoothing in a diagonalisation fashion we converge to a critical point of the original optimisation problem.

Dominik Wagner · Luke Ong
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot A6 in Virtual World ]  link »

Our best estimates of the age, contents, and geometry of the Universe come from comparing predictions of the Einstein-Boltzmann (E-B) equations with observations of galaxies and the afterglow of the Big Bang. Existing E-B solvers are not differentiable, and Bayesian parameter estimation of these differential equation models are thus restricted to employing gradient-free inference algorithms. This becomes intractable in the high-dimensional settings increasingly relevant for modern observations. Propagating derivatives through the numerical solution of these ordinary differential equations is tractable through automatic differentiation (AD). We are actively developing the first AD-enabled E-B solver, Bolt.jl, making use of the rich Julia ecosystem of AD tools. Beyond mitigating the cost of high-dimensional inference, Bolt.jl opens the door to testing new cosmological physics against data at the level of terms in the Einstein-Boltzmann equations, using neural ODEs and physics-informed neural networks (PINNs).

James Sullivan
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot A5 in Virtual World ]  link »

In this work, we propose a differentiable programming approach to data-driven modeling of distribution systems for electromechanical transient stability analysis. Our approach combines the traditional ZIP load model with a deep neural network formulated as a constrained nonlinear least-squares problem. We will discuss the formulation, setup, and training of the proposed model as a differentiable program. Finally, we will compare and investigate the performance of this new load model and present the results on a medium-scale 350-bus transmission-distribution network.

Jan Drgona · Andrew August · Elliott Skomski
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot A4 in Virtual World ]  link »

To target challenges in differentiable optimization we analyze and propose strate-gies for derivatives of the Matérn kernel with respect to the smoothness parameter.This problem poses a challenge in Gaussian processes modelling due to the lack ofrobust derivatives of the modified Bessel function of second kind. In the currentwork we scrutinize the mathematical and numerical hurdles posed by the differ-entiation of special functions and provide a set of options. Special focus is givento a newly derived series expansion for the modified Bessel function of secondkind which yields highly accurate results using the complex step method and ispromising for classical AD implementations.

Oana Marin · Paul Hovland
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot A3 in Virtual World ]  link »

We present neural differentiable predictive control (DPC) method for learning constrained neural control policies for uncertain linear systems. DPC is formulated as a differentiable problem whose computational graph architecture is inspired by classical model predictive control (MPC) structure. In particular, the optimization of the neural control policy is based on automatic differentiation of the MPC loss function through a differentiable closed-loop system dynamics model. We show that DPC can learn constrained neural control policies to stabilize systems with unstable dynamics, track time-varying references, and satisfy state and input constraints without the prior need of a supervisory MPC controller.

Jan Drgona · Aaron Tuor · Draguna L Vrabie
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot A2 in Virtual World ]  link »

No single Automatic Differentiation (AD) system is the optimal choice for all problems. This means informed selection of an AD system and combinations can be a problem-specific variable that can greatly impact performance. In the Julia programming language, the major AD systems target the same input and thus in theory can compose. Hitherto, switching between AD packages in the Julia Language required end-users to familiarize themselves with the user-facing API of the respective packages. Furthermore, implementing a new, usable AD package required AD package developers to write boilerplate code to define convenience API functions for end-users. As a response to these issues, we present AbstractDifferentiation.jl for the automatized generation of an extensive, unified, user-facing API for any AD package. By splitting the complexity between AD users and AD developers, AD package developers only need to implement one or two primitive definitions to support various utilities for AD users like Jacobians, Hessians and lazy product operators from native primitives such as pullbacks or pushforwards, thus removing tedious -- but so far inevitable -- boilerplate code, and enabling the easy switching and composing between AD implementations for end-users.

Frank Schäfer · Mohamed Tarek · Lyndon White · Chris Rackauckas
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot A1 in Virtual World ]  link »

The development of AD tools focuses mostly on handling floating point types in the target language. Taping optimizations in these tools mostly focus on specific operations like matrix vector products.Aggregated types like std::complex are usually handled by specifying the AD type as a template argument.This approach provides exact results, but prevents the use of expression templates.If AD tools are extended and specialized such that aggregated types can be added to the expression framework, then this will result in reduced memory utilization and improve the timing for applications where aggregated types such as complex number, matrix vector operations or layer operations in neural networks are used. Such an integration requires a reformulation of the stored data per expression and a rework of the tape evaluation process. In this paper we demonstrate the overhead of unhandled aggregated types in expression templates and provide basic ingredients for a tape implementation that supports arbitrary aggregated types for which the user has implemented some type traits. Finally, we demonstrate the advantages of aggregated type handling on a synthetic benchmark case.

Max Sagebaum
Mon 9:25 a.m. - 10:40 a.m.
(Poster) [ OpenReview [ Visit Poster at Spot A0 in Virtual World ]  link »

We present a linear algebra formulation of backpropogation that serves as an alternative to the traditional approach.Using matrices allows the calculation of gradients given the availability of a generically written Gaussian elimination which is representedby the ``backslash" symbol. Backpropogation is often connected to the chain rule for multivariate calculus, but we propose that this may be seen as a distraction from the underlying algebraic structure.The implementation shows how generic linear algebra can allow operators as elements of matrices, and without rewriting of any code, the software carries through to completion giving the correct answer. We demonstrate in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, it is possibleto realize this abstraction in code.

Ekin Akyürek · Alan Edelman · Bernie Wang
Mon 10:40 a.m. - 10:45 a.m.
Short Break (Break)
Mon 10:45 a.m. - 11:15 a.m.
Learning from Data through the Lens of Ocean Models, Surrogates, and their Derivatives (Invited Talk)
Patrick Heimbach
Mon 11:15 a.m. - 11:45 a.m.
Learnable Physics Models (Invited Talk)
Karen Liu
Mon 11:45 a.m. - 12:00 p.m.
(Oral) [ OpenReview  link »   

High level domain specific languages for the finite element method underpin high productivity programming environments for simulations based on partial differential equations (PDE) while employing automatic code generation to achieve high performance. However, a limitation of this approach is that it does not support operators that are not directly expressible in the vector calculus. This is critical in applications where PDEs are not enough to accurately describe the physical problem of interest. The use of deep learning techniques have become increasingly popular in filling this knowledge gap, for example to include features not represented in the differential equations, or closures for unresolved spatiotemporal scales. We introduce an interface within the Firedrake finite element system that enables a seamless interface with deep learning models. This new feature composes with the automatic differentiation capabilities of Firedrake, enabling the automated solution of inverse problems. Our implementation interfaces with PyTorch and can be extended to other machine learning libraries. The resulting framework supports complex models coupling PDEs and deep learning whilst maintaining separation of concerns between application scientists and software experts.

Nacime Bouziani
Mon 12:00 p.m. - 12:15 p.m.
(Oral) [ OpenReview  link »   

Automatic differentiation (AD) aims to compute derivatives of user-defined functions, but in Turing-complete languages, this simple specification does not fully capture AD’s behavior: AD sometimes disagrees with the true derivative of a differentiable program, and when AD is applied to non-differentiable or effectful programs, it is unclear what guarantees (if any) hold of the resulting code. We study an expressive differentiable programming language, with piecewise-analytic primitives, higher-order functions, and general recursion. Our main result is that even in this general setting, a version of Lee et al. [2020]’s correctness theorem (originally proven for a first-order language without partiality or recursion) holds: all programs denote so-called ωPAP functions, and AD computes correct intensional derivatives of them. Mazza and Pagani [2021]’s recent theorem, that AD disagrees with the true derivative of a differentiable recursive program at a measure-zero set of inputs, can be derived as a straight-forward corollary of this fact. We also apply the framework to study probabilistic programs, and recover a recent result from Mak et al. [2021] via a novel denotational argument.

Alexander Lew · Mathieu Huot · Vikash Mansinghka
Mon 12:15 p.m. - 12:30 p.m.
Mon 12:30 p.m. - 1:00 p.m.
Differentiable Programming for Protein Sequences and Structure (Invited Talk)
Sergey Ovchinnikov
Mon 1:00 p.m. - 1:30 p.m.
Approximate High Performance Computing Guided by Automatic Differentiation (Invited Talk)
Harshitha Menon
Mon 1:30 p.m. - 1:45 p.m.
(Oral) [ OpenReview  link »   

We give a complete decidable second-order equational axiomatisation of the forward differentiation of smooth multivariate functions. Differentiation is expressed using the binding structures available in second-order equational logic. The main mathematical theorem used is Severi’s multivariate Hermite interpolation theorem.

Gordon Plotkin
Mon 1:45 p.m. - 2:00 p.m.
(Oral) [ OpenReview  link »   

Kohn-Sham regularizer (KSR) is a machine learning approach that optimizes a physics-informed exchange-correlation functional within a differentiable Kohn-Sham density functional theory framework. We evaluate the generalizability of KSR by training on atomic systems and testing on molecules at equilibrium. We propose a spin-polarized version of KSR with local, semilocal, and nonlocal approximations for the exchange-correlation functional. The generalization error from our semilocal approximation is comparable to other differentiable approaches. Our nonlocal functional outperforms any existing machine learning functionals by predicting the ground-state energies of the test systems with a mean absolute error of 2.7 milli-Hartrees.

Bhupalee Kalita · Ryan Pederson · Li Li · Kieron burke
Mon 2:00 p.m. - 3:00 p.m.

Author Information

Ludger Paehler (Technical University of Munich)
William Moses (MIT)
Maria I Gorinova (Twitter Cortex)
Assefaw H. Gebremedhin (Washington State University)

Assefaw Gebremedhin is an associate professor in the School of Electrical Engineering and Computer Science at Washington State University, where he leads the Scalable Algorithms for Data Science (SCADS) Lab. His current research interests include: data science and AI, network science, high-performance computing, and applications in cyber security, energy systems, and bioinformatics. He is a recipient of the 2021 George Polya Prize in Applied Combinatorics, along with co-authors Fredrik Manne and Alex Pothen for work on "efficient graph coloring algorithms and codes with applications to Jacobian and Hessian matrix computations". In 2016, Assefaw received the National Science Foundation CAREER Award for work on fast and scalable combinatorial algorithms for data analytics. He holds the PhD and MS degrees in Computer Science and a bachelor's degree in Electrical Engineering.

Jan Hueckelheim (Argonne National Laboratory)
Sri Hari Krishna Narayanan (Argonne National Laboratory)

More from the Same Authors