Neural information processing systems have benefited tremendously from the availability of programming languages and frameworks for automatic differentiation (AD). Not only do NeurIPS benefit from programming languages for automatic inference but can also be considered as a language in their own right, consisting of differentiable and stochastic primitives. Combined with neural language models, these systems are increasingly capable of generating symbolic programs a human programmer might write in a high-level language. Developing neurosymbolic systems for automatic program synthesis requires insights from both statistical learning and programming languages.
AIPLANS invites all researchers working towards the same purpose in these two communities to build on common ground. Our workshop is designed to be as inclusive as possible towards researchers engaged in building programming languages and neurosymbolic systems.
Tue 3:45 a.m. - 4:00 a.m.
|
Introductory remarks
(
Introductory Remarks
)
SlidesLive Video » |
Breandan Considine 🔗 |
Tue 4:00 a.m. - 4:45 a.m.
|
Thinking like Transformers - Gail Weiss - Technion - Israel Institute of Technology
(
Invited Talk
)
SlidesLive Video » Transformers - the purely attention based NN architecture - have emerged as a powerful tool in sequence processing. But how does a transformer think? When we discuss the computational power of RNNs, or consider a problem that they have solved, it is easy for us to think in terms of automata and their variants (such as counter machines and pushdown automata). But when it comes to transformers, no such intuitive model is available. In this talk I will present a programming language, RASP (Restricted Access Sequence Processing), which we hope will serve the same purpose for transformers as finite state machines do for RNNs. In particular, we will identify the base computations of a transformer and abstract them into a small number of primitives, which are composed into a small programming language. We will go through some example programs in the language, and discuss how a given RASP program relates to the transformer architecture. |
Gail Weiss 🔗 |
Tue 4:45 a.m. - 4:55 a.m.
|
Q&A - Gail Weiss
(
Post Talk Q & A
)
|
🔗 |
Tue 5:00 a.m. - 6:00 a.m.
|
When Gödel discovered Automatic Differentiation - Marie Kerjean - Centre national de la recherche scientifique
(
Invited Talk
)
SlidesLive Video » I will explore the boundaries between differentiable programming and logic, through the prism of the Curry-Howard correspondence. I will recall the latter and explain how automatic differentiation fits into each of its three facets: functions, proofs and programs. In particular, I will explain how backpropagation is identified with Gödel's Dialectica translation, a transformation of logical formulas historically used to prove consistency theorems and widely used in proof theory since then. |
AIPLANS 2021 🔗 |
Tue 6:00 a.m. - 7:00 a.m.
|
Building machines that learn and think like people by learning to write programs: progress, open problems, and next steps - Josh Tenenbaum - Massachusetts Institute of Technology
(
Invited Talk
)
SlidesLive Video » If we want to build machines that think and learn like humans do, and that can learn and think with people, our best bet is to build machines that can learn to write programs expressing their thoughts in human-understandable code. These machines should also be able to learn from the kinds of data that humans naturally consume and produce: one or a few examples of program execution, and natural language descriptions of program goals or high-level structure. We are far from achieving this goal, but the last few years have seen intriguing first steps and opened up a new set of hard problems for future work. I will talk about some lessons learned: how we might best combine neural and symbolic approaches under the broad rubric of probabilistic inference in hierarchical generative models for code, and the synergies to be gained from looking at both execution examples and natural language as sources of data. I will also discuss promising near-term challenge domains that capture foundational human capacities for learning concepts, systems of concepts (or domain theories) and causal models, and where the next generation of program learning approaches could make important progress. |
Josh Tenenbaum 🔗 |
Tue 7:00 a.m. - 7:15 a.m.
|
Short break
|
🔗 |
Tue 7:15 a.m. - 8:15 a.m.
|
Panel Discussion
SlidesLive Video » |
🔗 |
Tue 8:15 a.m. - 8:55 a.m.
|
Daniel Selsam Microsoft Research
(
Tutorial
)
SlidesLive Video » |
Daniel Selsam 🔗 |
Tue 8:55 a.m. - 9:05 a.m.
|
Q&A - Daniel Selsam
(
Post Talk Q & A
)
|
🔗 |
Tue 9:00 a.m. - 10:15 a.m.
|
Lunch / Poster Session ( Poster Session ) link » | 🔗 |
Tue 10:15 a.m. - 10:20 a.m.
|
Remarks from Organisers
(
Introduction
)
|
🔗 |
Tue 10:20 a.m. - 10:46 a.m.
|
Randomized Automatic Differentiation - Ryan Adams - Princeton University
(
Invited Talk
)
SlidesLive Video » Optimization is at the heart of machine learning, and gradient computation is central to many optimization techniques. Stochastic optimization, in particular, has taken center stage as the principal method of fitting many models, from deep neural networks to variational Bayesian posterior approximations. Generally, one uses data subsampling to efficiently construct unbiased gradient estimators for stochastic optimization, but this is only one possibility. In this talk, I will discuss an alternative approach to constructing unbiased gradient estimates in machine learning problems. We will revisit the Jacobian accumulation problem at the heart of automatic differentiation, observing that it is possible to collapse the linearized computational graph of, e.g., deep neural networks, in a randomized way such that less memory is used but little performance is lost. This is joint work with students Alex Beatson, Deniz Oktay, Joshua Aduol, and Nick McGreivy. |
Ryan Adams 🔗 |
Tue 10:46 a.m. - 10:56 a.m.
|
Q&A - Ryan Adams
(
Post Talk Q & A
)
|
🔗 |
Tue 11:00 a.m. - 11:45 a.m.
|
Dependent Types for Machine Learning in Dex - David Duvenaud - University of Toronto
(
Invited Talk
)
SlidesLive Video » This talk will give a gentle introduction to Dex, an experimental programming language. Dex is designed to combine the clarity and safety of high-level functional languages with the efficiency of low-level numerical languages. For example, Dex allows one to move much of the informal type and shape information normally contained in comments into compile-time checked types, while also omitting unambiguous details, to keep things terse. It also allows in-place updates and stateful, loopy code that can automatically take advantage of parallelism in a fine-grained way. We'll demonstrate these features on standard deep architectures like attention and graph neural nets. |
David Duvenaud · AIPLANS 2021 🔗 |
Tue 11:45 a.m. - 11:55 a.m.
|
Q&A - David Duvenaud
(
Post Talk Q & A
)
|
🔗 |
Tue 12:00 p.m. - 12:45 p.m.
|
Differential Inference: A Criminally Underused Tool. - Alexander Rush - Cornell University
(
Invited Talk
)
SlidesLive Video » Differential Inference is the use of differentiation to perform probabilistic inference. The technique itself is relatively straightforward and plays nicely with autodiff: it roughly just automates Bayes' rule the way autodiff automates the chain rule. However, there is still a tendency for students to get tied up in the knots of even elementary probabilistic inference. Inspired by polemics that shined light on autodifferentiation, this talk will be half a tutorial on the use of differential inference and half a demonstration of all the fun math that it can remove from your life. |
Alexander Rush 🔗 |
Tue 12:45 p.m. - 12:55 p.m.
|
Q&A - Alexander Rush
(
Post Talk Q & A
)
|
🔗 |
Tue 1:00 p.m. - 1:05 p.m.
|
Introduction to Spotlight Speakers
(
Organiser Remarks
)
|
🔗 |
Tue 1:05 p.m. - 1:15 p.m.
|
Meta-Learning an Inference Algorithm for Probabilistic Programs - Gwonsoo Che
(
Spotlight Talks
)
SlidesLive Video » |
AIPLANS 2021 · Gwonsoo Che 🔗 |
Tue 1:15 p.m. - 1:22 p.m.
|
LazyPPL: laziness and types in non-parametric probabilistic programs - Hugo Paquet
(
Spotlight Talk
)
SlidesLive Video » |
AIPLANS 2021 · Hugo Paquet 🔗 |
Tue 1:22 p.m. - 1:32 p.m.
|
Learning Rules with Stratified Negation in Differentiable ILP - Giri Krishnan
(
Spotlight Talks
)
SlidesLive Video » |
AIPLANS 2021 · Giri Krishnan 🔗 |
Tue 1:32 p.m. - 1:41 p.m.
|
Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization - Róbert Csordás
(
Spotlight Talk
)
SlidesLive Video » |
AIPLANS 2021 · Róbert Csordás 🔗 |
Tue 1:41 p.m. - 1:51 p.m.
|
Type Inference as Optimization - Eirene V. Pandi
(
Spotlight Talk
)
SlidesLive Video » |
AIPLANS 2021 · Eirini V. Pandi 🔗 |
Tue 1:51 p.m. - 2:00 p.m.
|
Q&A for Spotlight Authors
(
Q & A
)
|
🔗 |
Tue 2:00 p.m. - 2:15 p.m.
|
Closing Remarks
(
Closing remarks
)
SlidesLive Video » |
Breandan Considine 🔗 |
Tue 2:15 p.m. - 3:00 p.m.
|
Poster Session link » | AIPLANS 2021 🔗 |
-
|
Type Inference as Optimization
(
Poster
)
link »
Optionally typed dynamic languages can permit multiple valid type assignments. When this happens, developers can prefer one valid type assignment over another because it better reflects how they think about the program and the problem it solves. Natural type inference (NTI) uses natural language text within source code, such as identifiers, to help choose valid programming language types. A growing body of techniques has been proposed for NTI. These techniques predict types; they seek to return natural type assignments (assignments that reflect developer preferences) while striving for correctness. They are empirically effective, but they are not sound by construction: they do not leverage programming language theory to formalize their algorithms and show correctness and termination. Filling this foundational gap is the purpose of this paper. We are the first to present a detailed algorithm for NTI that is validated with theorems and proofs. Valid type assignments obey logical constraints arising from type rules; natural type assignments obey natural constraints arising from the natural language text associated with a variable and its uses.The core intuition of this work is that logical and natural constraints can interact to speed finding a type valuation that 1. type checks (satisfies the logical constraints) and 2. is most natural.We formulate NTI as a joint optimization problem. To do this, we define a numerical relaxation over boolean logical constraints that give us a condition that we treat as a hard constraint, while simultaneously we minimize distance from natural constraints, which we treat as soft constraints for our optimization problem. Our main result, the first formal proof of soundness for natural type inference, is that our algorithm always terminates, either with an error or with a tuple that is guaranteed to be a type signature for its input. |
Eirini V. Pandi · Earl Barr · Andrew Gordon · Charles Sutton 🔗 |
-
|
Are Transformers All That Karel Needs?
(
Poster
)
link »
Recent works have shown the incredible promise of using neural networks for the task of program synthesis from input-output examples. In this paper, we propose using Transformer-based architectures as a baseline for the program synthesis task on the Karel dataset. Specifically, we leverage DistillGPT as our decoder model to perform program synthesis for the Karel DSL. We show that changing the model architecture from an LSTM to a transformer based architecture, we are able to significantly improve on supervised learning approaches obtaining a top-1 generalization accuracy of 82.4%. Further, applying execution guided search on the output beams of the model increases the accuracy of our approach to 89.64%. |
Abhay Garg · Anand Sriraman · Shirish Karande 🔗 |
-
|
Towards Neural Functional Program Evaluation
(
Poster
)
link »
This paper explores the capabilities of current transformer-based language models for program evaluation of simple functional programming languages. We introduce a new program generation mechanism that allows control over syntactic sugar for semantically equivalent programs. T5 experiments reveal that neural functional program evaluation performs surprisingly well, achieving high 90% exact program match scores for most in-distribution and out-of-distribution tests. We present and evaluate on three datasets to study generalization abilities that are specific to functional programs based on: type, function composition, and reduction steps. |
Torsten Scholak · Jonathan Pilault 🔗 |
-
|
Staged compilation of tensor expressions
(
Poster
)
link »
We present our current progress towards a metaprogramming framework for tensor expressions embedded in Haskell; the system offers a high-level syntax for dimension-annotated linear algebra, and generates specialized source code corresponding to the input expression. |
Marco Zocca 🔗 |
-
|
Safe Neurosymbolic Learning with Differentiable Symbolic Execution
(
Poster
)
link »
We study the problem of learning verifiably safe parameters for programs that use neural networks as well as symbolic, human-written code. Such neurosymbolic programs arise in many safety-critical domains. However, because they need not be differentiable, they cannot be learned using existing approaches to integrating learning and verification. Our method, Differentiable Symbolic Execution (DSE), learns such programs by sampling code paths using symbolic execution, constructing gradients of a worst-case ``safety loss'' along these paths, and then backpropagating these gradients through program operations using a generalization of the reinforce estimator. We evaluate the method on a mix of synthetic tasks and real-world control and navigation benchmarks. Our experiments show that DSE significantly outperforms the state-of-the-art DiffAI method on these tasks. |
Chenxi Yang · Swarat Chaudhuri 🔗 |
-
|
AutoCoder: Leveraging Transformers for Automatic Code Synthesis
(
Poster
)
link »
Program synthesis from natural language descriptions is a challenging task. This paper explores two variants of transformer models for the task of program synthesis and showcase higher performance than the existing SOTA models. Through the end, we also discuss the differences in learned representation in these two variants. We demonstrate that the vanilla transformer model has a higher capacity to memorize the training data as compared to the other variant. |
Mrinal Anand · Mayank Singh 🔗 |
-
|
Learning Rules with Stratified Negation in Differentiable ILP.
(
Poster
)
link »
Differentiable methods to learn rules (logic programs) have the potential to integrate the interpretability, transferability and low data requirements of inductive logic programming with the noise tolerance of non-symbolic learning. While negation is an essential component of reasoning, incorporating it into a logic programming framework poses several problems (hence its central place in the logic programming and nonmonotonic reasoning communities). Current implementations of differentiable rule learners either exclude negation entirely or else treat it only in passing. In this work, we introduce stratified negation into a differentiable inductive logic programming framework, and we demonstrate that the resulting system can learn recursive programs in which negation plays a central role. We include examples from multiple domains, e.g., arithmetic, graph, sets and lists. |
Giri Krishnan · Ramyaa Ramyaa 🔗 |
-
|
AutumnSynth: Synthesis of Reactive Programs with Structured Latent State
(
Poster
)
link »
The human ability to efficiently discover causal theories of their environments from observations is a feat of nature that remains elusive in machines. In this work, we attempt to make progress on this frontier by formulating the challenge of causal mechanism discovery from observed data as one of program synthesis. We focus on the domain of time-varying, Atari-like 2D grid worlds, and represent causal models in this domain using a programming language called Autumn. Discovering the causal structure underlying a sequence of observations is equivalent to identifying the program in the Autumn language that generates the observations. We introduce a novel program synthesis algorithm, called AutumnSynth, that approaches this synthesis challenge by integrating standard methods of synthesizing functions with an automata synthesis approach, used to discover the model's latent state. We evaluate our method on a suite of Autumn programs designed to express the richness of the domain, and our results signal the potential of our formulation. |
Ria Das · Zenna Tavares · Josh Tenenbaum · Armando Solar-Lezama 🔗 |
-
|
PAC Synthesis of Machine Learning Programs
(
Poster
)
link »
We study the problem of synthesizing programs that include machine learning components such as deep neural networks (DNNs). We focus on statistical properties, which are properties expected to hold with high probability---e.g., that an image classification model correctly identifies people in images with high probability. We propose novel algorithms for sketching and synthesizing such programs by leveraging ideas from statistical learning theory to provide statistical soundness guarantees. We evaluate our approach on synthesizing list processing programs that include DNN components used to process image inputs, as well as case studies on image classification and on precision medicine. Our results demonstrate that our approach can be used to synthesize programs with probabilistic guarantees. |
Osbert Bastani 🔗 |
-
|
Learning compositional programs with arguments and sampling
(
Poster
)
link »
One of the most challenging goals in designing intelligent systems is empowering them with the ability to synthesize programs from data. Namely, given specific requirements in the form of input/output pairs, the goal is to train a machine learning model to discover a program that satisfies those requirements. A recent class of methods exploits combinatorial search procedures and deep learning to learn compositional programs. However, they usually generate only toy programs using a domain-specific language that does not provide any high-level feature, such as function arguments, which reduces their applicability in real-world settings. We extend upon a state of the art model, AlphaNPI, by learning to generate functions that can accept arguments. This improvement will enable us to move closer to real computer programs. We showcase the potential of our approach by learning the Quicksort algorithm, showing how the ability to deal with arguments is crucial for learning and generalization. |
Giovanni De Toni · Andrea Passerini 🔗 |
-
|
Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization
(
Poster
)
link »
Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization. The situation is especially frustrating for algorithmic tasks, where they often fail to find intuitive solutions that can be simply expressed in terms of attention patterns. Here we propose two modifications to the Transformer architecture, copy gate and geometric attention, which facilitate learning such intuitive and interpretable solutions to algorithmic problems. Our novel Transformer, called Transformer Control Flow (TCF) achieves 100% length generalization accuracy on the classic compositional table lookup task. The resulting attention and gating patterns are interpretable, demonstrating that the model implements adaptive control flow. |
Róbert Csordás · Kazuki Irie · Jürgen Schmidhuber 🔗 |
-
|
Adversarial Robustness of Program Synthesis Models
(
Poster
)
link »
The resurgence of automatic program synthesis has been observed with the rise of deep learning. In this paper, we study the behaviour of the program synthesis model under adversarial settings. Our experiments suggest that these program synthesis models are prone to adversarial attacks. The proposed transformer model have higher adversarial performance than the current state-of-the-art program synthesis model. We specifically experiment with \textsc{AlgoLisp} DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. |
Mrinal Anand · Mayank Singh 🔗 |
-
|
Learning C to x86 Translation: An Experiment in Neural Compilation
(
Poster
)
link »
Deep learning has had a significant impact on many fields. Recently, code-to-code neural models have been used in code translation, code refinement and decompilation. However, the question of whether these models can automate compilation has yet to be investigated. In this work, we explore neural compilation, building and evaluating Transformer models that learn how to produce x86 assembler from C code.Although preliminary results are relatively weak, we make our data, models and code publicly available to encourage further research in this area. |
Jordi Armengol-Estapé · Michael O'Boyle 🔗 |
-
|
Synthesizing Video Trajectory Queries
(
Poster
)
link »
We propose a novel framework called Quivr for synthesizing queries to identify events of interest in video data. For instance, Quivr can be used to identify instances of human driving behaviors such as lane changes or left turns, which are important for designing planning algorithms for autonomous cars. Our queries operate over object trajectories predicted by a deep object tracking model. Then, a query consists of regular expression operators used to compose underlying predicates (e.g., whether a car is in a lane), and selects a subset of trajectories. A key challenge is that queries are difficult for end users to develop: queries must reason about complex spatial and temporal patterns in object trajectories in order to select trajectories of interest, and predicates often include real-valued parameters (e.g., whether two cars are within a certain distance) that can be tedious to manually tune. Thus, Quivr automatically synthesizes queries given examples of trajectories that the query should match. To make the synthesis procedure efficient, we use overapproximations to prune invalid branches of the query search space, including using a quantitative variant of our query semantics to efficiently prune the search space over parameter values. We also propose two optimizations for speeding up the execution of our queries. Finally, we leverage an active learning strategy to disambiguate between multiple consistent candidate queries by collecting additional labels from the user. We evaluate Quivr on a benchmark of 11 tasks, and demonstrate that it can synthesize accurate queries for each task given just a few examples, and that our pruning strategy and optimizations substantially reduce synthesis time. |
Stephen Mell · Favyen Bastani · Stephan Zdancewic · Osbert Bastani 🔗 |
-
|
Augmenting Classic Algorithms with Neural Components for Strong Generalisation on Ambiguous and High-Dimensional Data
(
Poster
)
link »
We augment classic algorithms with learned components to adapt them to domains currently dominated by deep learning models. Two traditional sorting algorithms with learnable neural building blocks are applied to visual data with apriori unknown symbols and rules. The models are quickly and reliably trained end-to-end in a supervised setting. Our models learn symbol representations and generalise better than generic neural network models to longer input sequences. |
Imanol Schlag · Jürgen Schmidhuber 🔗 |
-
|
Meta-Learning an Inference Algorithm for Probabilistic Programs
(
Poster
)
link »
We present a meta-algorithm for learning a posterior-inference algorithm for restricted probabilistic programs. Our meta-algorithm takes a training set of probabilistic programs that describe models with observations, and attempts to learn an efficient method for inferring the posterior of a similar program.A key feature of our approach is the use of what we call a white-box inference algorithm that analyses the given program sequentially using multiple neural networks to compute an approximate posterior.The parameters of these networks are learnt from a training set by our meta-algorithm.We empirically demonstrate that the learnt inference algorithm generalises well to programs that are new in terms of both parameters and model structures, and report cases where our approach achieves greater test-time efficiency than alternatives such as HMC. |
Gwonsoo Che · Hongseok Yang 🔗 |
-
|
Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning
(
Poster
)
link »
Deep learning and symbolic reasoning are complementary techniques for an intelligent system. However, principled combinations of these techniques are typically limited in scalability, rendering them ill-suited for real-world applications. We propose Scallop, a system that builds upon probabilistic deductive databases, to bridge this gap. On synthetic tasks involving mathematical and logical reasoning, Scallop scales significantly better without sacrificing accuracy compared to DeepProbLog, a principled neural logic programming approach. Scallop also scales to a real-world Visual Question Answering (VQA) benchmark that requires multi-hop reasoning, achieving 84.22% accuracy and outperforming two VQA-tailored models based on Neural Module Networks and transformers by 12.42% and 21.66% respectively. |
Jiani Huang · Ziyang Li · Binghong Chen · Karan Samel · Mayur Naik · Le Song · Xujie Si 🔗 |
-
|
LazyPPL: laziness and types in non-parametric probabilistic programs
(
Poster
)
link »
We introduce LazyPPL, a prototype probabilistic programming library for Haskell. The library emphasises the clarifying power of types, and the connection between non-parametric, stochastic processes and lazy (call by need) evaluation. We illustrate the power of the language with natural specifications of infinite structures including Poisson point processes, Gaussian processes, and Dirichlet Process clustering. |
Hugo Paquet · Sam Staton 🔗 |
-
|
Proof Extraction for Logical Neural Networks
(
Poster
)
link »
Automated Theorem Provers (ATPs) are widely used for the verification of logical statements. Explainability is one of the key advantages of ATPs: providing an expert readable proof path which shows the inference steps taken to conclude correctness. Conversely, Neuro-Symbolic Networks (NSNs) that perform theorem proving, do not have this capability. We propose a proof-tracing and filtering algorithm to provide explainable reasoning in the case of Logical Neural Networks (LNNs), a special type of Neural-Theorem Prover (NTP). |
Thabang Lebese · Ndivhuwo Makondo · Cristina Cornelio · Naweed A Khan 🔗 |
-
|
A Genetic Programming Approach To Zero-Shot Neural Architecture Ranking
(
Poster
)
link »
Neural networks are becoming increasingly ubiquitous in a wide range of use cases. A primary hurdle in deploying neural networks in many scenarios is the tedious and difficult neural network architectural design process, which was reliant on expert knowledge and iterative design. Neural Architecture Search (NAS) reduces the human effort required for design, but still has considerable resource requirements and is extremely slow. To address the inefficiencies of conventional NAS, Zero-Shot NAS is a new paradigm, which introduces zero shot neural architecture scoring metrics (NASMs) to identify good neural network designs without training them. While applying Zero Shot NASMs is cheap and requires no training resources, we identify that there is a lack of NASMs that generalize well across neural architecture design spaces. In this paper, we present a program representation for NASMs and automate its search with genetic programming. We discover effective NASMs for Image Classification as well as Automatic Speech Recognition. We believe that our work indicates a new direction for NASM design and can greatly benefit from recent advances in program synthesis. |
Yash Akhauri · Juan Munoz · Ravishankar Iyer · Nilesh Jain 🔗 |