Timezone: »

Workshop
AI for Science: Progress and Promises
Yi Ding · Yuanqi Du · Tianfan Fu · Hanchen Wang · Anima Anandkumar · Yoshua Bengio · Anthony Gitter · Carla Gomes · Aviv Regev · Max Welling · Marinka Zitnik

Fri Dec 02 06:00 AM -- 04:00 PM (PST) @ Room 388 - 390

 Fri 6:00 a.m. - 6:15 a.m. Opening Remark 🔗 Fri 6:15 a.m. - 7:10 a.m. Invited Talk Prof. Weinan E (Invited Talk) 🔗 Fri 7:10 a.m. - 8:05 a.m. Invited Talk Prof. Shuiwang Ji (Invited Talk) 🔗 Fri 8:05 a.m. - 8:15 a.m. Break 🔗 Fri 8:15 a.m. - 9:10 a.m. Invited Talk Dr. Maria Schuld (Invited Talk) 🔗 Fri 9:10 a.m. - 10:05 a.m. Invited Talk Prof. David Baker (Invited Talk) 🔗 Fri 10:05 a.m. - 11:00 a.m. Poster Session I (Poster) 🔗 Fri 11:00 a.m. - 11:10 a.m. Oral Presentation 1 (Oral) 🔗 Fri 11:10 a.m. - 11:20 a.m. Oral Presentation 2 (Oral) 🔗 Fri 11:20 a.m. - 11:30 a.m. Oral Presentation 3 (Oral) 🔗 Fri 11:30 a.m. - 11:40 a.m. Oral Presentation 4 (Oral) 🔗 Fri 11:40 a.m. - 11:50 a.m. Oral Presentation 5 (Oral) 🔗 Fri 11:55 a.m. - 12:55 p.m. Panel 🔗 Fri 12:55 p.m. - 1:10 p.m. Break 🔗 Fri 1:10 p.m. - 2:05 p.m. Invited Talk Prof. Jimeng Sun (Invited Talk) 🔗 Fri 2:05 p.m. - 3:00 p.m. Invited Talk Prof. Tess Smidt (Invited Talk) 🔗 Fri 3:00 p.m. - 3:10 p.m. Closing Remark 🔗 Fri 3:10 p.m. - 4:00 p.m. Poster Session II (Poster) 🔗 - Readability of Scientific Papers for English Learners in Various Fields of Science (Poster)  link » Most scientific papers are written in English. Non-native English speakers must simultaneously learn English and their area of scientific expertise. This is an obstacle for non-native English speakers in studying science. The difficulty of English used in scientific papers varies from field to field. For example, the vocabulary used in the medical sciences seems apparently difficult for English learners. Which scientific fields use English that is difficult for English learners to what extent? There are few existing studies in this regard.In this study, we compare the readability of English papers in various scientific fields by constructing and applying state-of-the-art artificial intelligence-based automatic readability assessor trained using actual textual data obtained by testing language learners and collecting judgments of language teachers. In experiments, our automatic assessors confirmed the intuition that medical science papers tend to be difficult for English learners. Moreover, our automatic assessors successfully quantified which field is difficult for English learners to what extent. Link » Yo Ehara 🔗 - Discovering ordinary differential equations that govern time-series (Poster)  link » Natural laws are often described through differential equations yet finding a differential equation that describes the governing law underlying observed data is a challenging and still mostly manual task. In this paper we make a step towards the automation of this process: we propose a transformer-based sequence-to-sequence model that recovers scalar autonomous ordinary differential equations (ODEs) in symbolic form from time-series data of a single observed solution of the ODE. Our method is efficiently scalable: after one-time pretraining on a large set of ODEs, we can infer the governing laws of a new observed solution in a few forward passes of the model. Then we show that our model performs better or on par with existing methods in various test cases in terms of accurate symbolic recovery of the ODE, especially for more complex expressions. Link » Sören Becker · Michal Klein · Alexander Neitz · Giambattista Parascandolo · Niki Kilbertus 🔗 - Deep Surrogate Docking: Accelerating Automated Drug Discovery with Graph Neural Networks (Poster)  link » The process of screening molecules for desirable properties is a key step in several applications, ranging from drug discovery to material design. During the process of drug discovery specifically, protein-ligand docking, or chemical docking, is a standard in-silico scoring technique that estimates the binding affinity of molecules with a specific protein target. Recently, however, as the number of virtual molecules available to test has rapidly grown, these classical docking algorithms have created a significant computational bottleneck. We address this problem by introducing Deep Surrogate Docking (DSD), a framework that applies deep learning-based surrogate modeling to accelerate the docking process substantially. DSD can be interpreted as a formalism of several earlier surrogate prefiltering techniques. Specifically, we show that graph neural networks (GNNs) can serve as fast and accurate estimators of classical docking algorithms. Additionally, we introduce FiLMv2, a novel GNN architecture which we show outperforms existing state-of-the-art GNN architectures, attaining more accurate and stable performance by allowing the model to filter out irrelevant information from data more efficiently. Through extensive experimentation and analysis, we show that the DSD workflow combined with the FiLMv2 architecture provides a 9.496x speedup in molecule screening with a $<3\%$ recall error rate on an example docking task. Our open-source code is available at [hidden for anonymous review]. Link » Ryien Hosseini · Filippo Simini · Austin Clyde · Arvind Ramanathan 🔗 - Structural Causal Model for Molecular Dynamics Simulation (Oral)  link » Molecular dynamics (MD) simulations describe the mechanical behaviors of molecular systems through empirical approximations of interatomic potentials. Machine learning-based approaches can improve such potentials with better transferability and generalization. Among them, graph neural networks have prevailed as they incorporate the graph structure prior while learning the interatomic interactions. Nevertheless, the simple design choices and heuristics in devising graph neural networks make them lack an explicitly interpretable component to identify the true physical interactions within the underlying system. On the other extreme, physical models can give a rather comprehensive description of a system but are hard to specify. Causal modeling lies in between these two extremes, and can provide us with more modeling flexibility. In this paper, we propose a structural causal molecular dynamics model (SCMD), the first causality-based framework to model interatomic and dynamical interactions in molecular systems by inferring causal relationships among atoms from observational data. Specifically, we leverage the structural causal model (SCM) to model the interaction system of MD. To infer the SCM, we construct the graph in SCM as the dynamic Bayesian network (DBN), which is learned by a sequential generative model named SC-VAE. In the SC-VAE, the encoder and decoder infer the causal structure and temporal dynamics. All components are learned in an end-to-end fashion, and the DBN is learned in an unsupervised way. Furthermore, by concerning the underlying data generation process, inducing the causal structure and temporal dynamics of the system, one can enjoy a robust and flexible MD simulation model to explicitly capture the long-range and time-dependent movement dynamics. We demonstrate the efficacy of SCMD through empirical validations on the complex molecular system (i.e., single-chain coarse-grained polymers in implicit solvent) for long-duration simulation and dynamical property prediction. Link » Qi Liu · Yuanqi Du · Fan Feng · Qiwei Ye · Jie Fu 🔗 - Pre-training via Denoising for Molecular Property Prediction (Poster)  link » Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. Relying on the well-known link between denoising autoencoders and score-matching, we show that the denoising objective corresponds to learning a molecular force field -- arising from approximating the Boltzmann distribution with a mixture of Gaussians -- directly from equilibrium structures. Our experiments demonstrate that using this pre-training objective significantly improves performance on multiple benchmarks, achieving a new state-of-the-art on the majority of targets in the widely used QM9 dataset. Our analysis then provides practical insights into the effects of different factors -- dataset sizes, model size and architecture, and the choice of upstream/downstream datasets -- on pre-training. Link » Sheheryar Zaidi · Michael Schaarschmidt · James Martens · Hyunjik Kim · Yee Whye Teh · Alvaro Sanchez Gonzalez · Peter Battaglia · Razvan Pascanu · Jonathan Godwin 🔗 - Automated Protein Function Description for Novel Class Discovery (Poster)  link » Knowledge of protein function is necessary for understanding biological systems, but the discovery of new sequences from high-throughput sequencing technologies far outpaces their functional characterization. Beyond the problem of assigning newly sequenced proteins to known functions, a more challenging issue is discovering novel protein functions. The space of possible functions becomes unlimited when considering designed proteins. Protein function prediction, as it is framed in the case of Gene Ontology term prediction, is a multilabel classification problem with a hierarchical label space. However, this framing does not provide guiding principles for discovering completely novel functions.Here we propose a neural machine translation model in order to generate descriptions of protein functions in natural language. In this way, instead of making predictions in the limited label space, our model generates descriptions in the language space, and thus is capable of generating novel functional descriptions. Given the novelty of our approach, we design metrics to evaluate the performance of our model: correctness, specificity and robustness. We provide results of our model in the zero-shot classification setting, scoring functional descriptions that the model has not seen before for proteins that have limited homology to those in the training set. Finally, we show generated function descriptions compared to ground truth descriptions for qualitative evaluation. Link » Meet Barot · Vladimir Gligorijevic · Richard Bonneau · Kyunghyun Cho 🔗 - Structure-Inducing Pre-training (Poster)  link » Language model pre-training and derived methods are incredibly impactful in machine learning.However, there remains considerable uncertainty on exactly why pre-training helps improve performance for fine-tuning tasks. This is especially true when attempting to adapt language-model pre-training to domains outside of natural language.Here, we analyze this problem by exploring how existing pre-training methods impose relational structure in their induced per-sample latent spaces---i.e., what constraints do pre-training methods impose on the distance or geometry between the pre-trained embeddings of two samples $\boldsymbol{x}_i$ and $\boldsymbol{x}_j$.Through a comprehensive review of existing pre-training methods, we find that this question remains open. This is true despite theoretical analyses demonstrating the importance of understanding this form of induced structure.Based on this review, we introduce a descriptive framework for pre-training that allows for a granular, comprehensive understanding of how relational structure can be induced. We present a theoretical analysis of this framework from first principles and establish a connection between the relational inductive bias of pre-training and fine-tuning performance. We also show how to use the framework to define new pre-training methods.We build upon these findings with empirical studies on benchmarks spanning 3 data modalities and ten fine-tuning tasks. These experiments validate our theoretical analyses, inform the design of novel pre-training methods, and establish consistent improvements over a compelling suite of baseline methods. Link » TestMatt TestMcDermott · Brendan Yap · Peter Szolovits · Marinka Zitnik 🔗 - Physics-Embedded Neural Networks: Graph Neural PDE Solvers with Mixed Boundary Conditions (Poster)  link » Graph neural network (GNN) is a promising approach to learning and predicting physical phenomena described in boundary value problems, such as partial differential equations (PDEs) with boundary conditions. However, existing models inadequately treat boundary conditions essential for the reliable prediction of such problems. In addition, because of the locally connected nature of GNNs, it is difficult to accurately predict the state after a long time, where interaction between vertices tends to be global. We present our approach termed physics-embedded neural networks that considers boundary conditions and predicts the state after a long time using an implicit method. It is built based on an $\mathrm{E}(n)$-equivariant GNN, resulting in high generalization performance on various shapes. We demonstrate that our model learns flow phenomena in complex shapes and outperforms a well-optimized classical solver and a state-of-the-art machine learning model in speed-accuracy trade-off. Therefore, our model can be a useful standard for realizing reliable, fast, and accurate GNN-based PDE solvers. Link » Masanobu Horie · NAOTO MITSUME 🔗 - Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design (Poster)  link » Fragment-based drug discovery has been an effective paradigm in early-stage drug development. An open challenge in this area is designing linkers between disconnected molecular fragments of interest to obtain chemically-relevant candidate drug molecules. In this work, we propose DiffLinker, an E(3)-equivariant 3D-conditional diffusion model for molecular linker design. Given a set of disconnected fragments, our model places missing atoms in between and designs a molecule incorporating all the initial fragments. Unlike previous approaches that are only able to connect pairs of molecular fragments, our method can link an arbitrary number of fragments. Additionally, the model automatically determines the number of atoms in the linker and its attachment points to the input fragments. We demonstrate that DiffLinker outperforms other methods on the standard datasets generating more diverse and synthetically-accessible molecules. Besides, we experimentally test our method in real-world applications, showing that it can successfully generate valid linkers conditioned on target protein pockets. Link » Ilia Igashov · Hannes Stärk · Clément Vignac · Victor Garcia Satorras · Pascal Frossard · Max Welling · Michael Bronstein · Bruno Correia 🔗 - Publicly Available Privacy-preserving Benchmarks for Polygenic Prediction (Poster)  link » Recently, several new approaches for creating polygenic scores (PGS) have been developed and this trend shows no sign of abating. However, it has thus far been challenging to determine which approaches are superior, as different studies report seemingly conflicting benchmark results. This heterogeneity in benchmark results is in part due to different outcomes being used, but also due to differences in the genetic variants being used, data preprocessing, and other quality control steps. As a solution, a publicly available benchmark for polygenic prediction is presented here, which allows researchers to both train and test polygenic prediction methods using only summary-level information, thus preserving privacy. Using simulations and real data, we show that model performance can be estimated with accuracy, using only linkage disequilibrium (LD) information and genome-wide association summary statistics for target outcomes. Finally, we make this PGS benchmark - consisting of 8 outcomes, including somatic and psychiatric disorders - publicly available for researchers to download on our PGS benchmark platform (http://www.pgsbenchmark.org). We believe this benchmark can help establish a clear and unbiased standard for future polygenic score methods to compare against. Link » Menno Witteveen · Menno Witteveen 🔗 - Bi-channel Masked Graph Autoencoders for Spatially Resolved Single-cell Transcriptomics Data Imputation (Poster)  link » Spatially resolved transcriptomics bring exciting breakthroughs to single-cell analysis by providing physical locations along with gene expression. However, as a cost of the extremely high resolution, the technology also results in much more missing values in the data, i.e. dropouts. While a common solution is to perform imputation on the missing values, existing imputation methods majorly focus on transcriptomics data and tend to yield sub-optimal performance on spatial transcriptomics data. To advance spatial transcriptomics imputation, we propose a new technique to adaptively exploit the spatial information of cells and the heterogeneity among different types of cells.Furthermore, we adopt a mask-then-predict paradigm to explicitly model the recovery of dropouts and enhance the denoising effect. Compared to previous studies, our work focus on new large-scale cell-level data instead of spots or beads. Preliminary results have demonstrated that our method outperforms previous methods for removing dropouts in high-resolution spatially resolved transcriptomics data. Link » Hongzhi Wen · Wei Jin · Jiayuan Ding · Christopher Xu · Yuying Xie · Jiliang Tang 🔗 - Graph Neural Networks for Multimodal Single-Cell Data Integration (Poster)  link » Recent advances in multimodal single-cell technologies have enabled simultaneous acquisitions of multi-omics data from the same cell, providing deeper insights into cellular states and dynamics. However, it is challenging to learn the joint representations from the multimodal data, model the relationship between modalities, and, more importantly, incorporate the vast amount of single-modality datasets into the downstream analyses. To address these challenges and correspondingly facilitate multimodal single-cell data analyses, three key tasks have been introduced: modality prediction, modality matching and joint embedding. In this work, we present a general Graph Neural Network framework \method{} to tackle these three tasks and show that scMoGNN demonstrates superior results in all three tasks compared with the state-of-the-art and conventional approaches. Our method is an official winner in the overall ranking of modality prediction from a NeurIPS 2021 Competition. Link » Hongzhi Wen · Jiayuan Ding · Wei Jin · Yiqi Wang · Yuying Xie · Jiliang Tang 🔗 - A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences (Poster)  link » Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other.In this work, we propose a Pareto-compositional energy-based model (pcEBM), a framework that uses multiple gradient descent for sampling new designs that adhere to various constraints in optimizing distinct properties. We demonstrate its ability to learn non-convex Pareto fronts and generate sequences that simultaneously satisfy multiple desired properties across a series of real-world antibody design tasks. Link » Nataša Tagasovska · Nathan Frey · Andreas Loukas · Isidro Hotzel · Julien Lafrance-Vanasse · Ryan Kelly · Yan Wu · Arvind Rajpal · Richard Bonneau · Kyunghyun Cho · Stephen Ra · Vladimir Gligorijevic 🔗 - Retrieval-based Controllable Molecule Generation (Poster)  link » Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small set of exemplar molecules, i.e., those that (partially) satisfy the design criteria, to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. We design a retrieval mechanism that retrieves and fuses the exemplar molecules with the input molecule, which is trained by a new self-supervised objective that predicts the nearest neighbor of the input molecule. We also propose an iterative refinement process to dynamically update the generated molecules and retrieval database for better generalization. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning. On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-CoV-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods. Link » Jack Wang · Weili Nie · Zhuoran Qiao · Chaowei Xiao · Richard Baraniuk · Anima Anandkumar 🔗 - Towards Neural Variational Monte Carlo That Scales Linearly with System Size (Poster)  link » Quantum many-body problems are some of the most challenging problems in science and are central to demystifying some exotic quantum phenomena, e.g., high-temperature superconductors. The combination of neural networks (NN) for representing quantum states, coupled with the Variational Monte Carlo (VMC) algorithm, has been shown to be a promising method for solving such problems. However, the run-time of this approach scales quadratically with the number of simulated particles, constraining the practically usable NN to — in machine learning terms — minuscule sizes (<10M parameters). Considering the many breakthroughs brought by extreme NN in the +1B parameters scale to other domains, lifting this constraint could significantly expand the set of quantum systems we can accurately simulate on classical computers, both in size and complexity. We propose a NN architecture called Vector-Quantized Neural Quantum States (VQ-NQS) that utilizes vector-quantization techniques to leverage redundancies in the local-energy calculations of the VMC algorithm — the source of the quadratic scaling. In our preliminary experiments, we demonstrate VQ-NQS ability to reproduce the ground state of the 2D Heisenberg model across various system sizes, while reporting a significant reduction of up to ${\times}5$ in the number of FLOPs in the local-energy calculation. Link » Or Sharir · Garnet Chan · Anima Anandkumar 🔗 - Predicting Immune Escape with Pretrained Protein Language Model Embeddings (Poster)  link » Assessing the severity of new pathogenic variants requires an understanding of which mutations will escape the human immune response. Even single point mutations to an antigen can cause immune escape and infection via abrogation of antibody binding. Recent work has modeled the effect of single point mutations on proteins by leveraging the information contained in large-scale, pretrained protein language models. These models are often applied in a zero-shot setting, where the effect of each mutation is predicted based on the output of the language model with no additional training. However, this approach cannot appropriately model immune escape, which involves the interaction of two proteins---antibody and antigen---instead of one protein and requires making different predictions for the same antigenic mutation in response to different antibodies. Here, we explore several methods for predicting immune escape by building models on top of embeddings from pretrained protein language models. We evaluate our methods on a SARS-CoV-2 deep mutational scanning dataset and show that our embedding-based methods significantly outperform zero-shot methods, which have almost no predictive power. We also highlight insights gained into how best to use embeddings from pretrained protein language models to predict escape. Despite these promising results, simple statistical and machine learning baseline models that do not use pretraining perform comparably, showing that computationally expensive pretraining approaches may not be beneficial for escape prediction. Furthermore, all models perform relatively poorly, indicating that future work is necessary to improve escape prediction with or without pretrained embeddings. Link » Kyle Swanson · Howard Chang · James Zou 🔗 - Minimax Optimal Kernel Operator Learning via Multilevel Training (Poster)  link » Learning mappings between infinite dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbert spaces. We establish the information-theoretic lower bound in terms of the Sobolev Hilbert-Schmidt norm and show that a regularization that learns the spectral components below the bias contour and ignores the ones that above the variance contour can achieve optimal learning rate. At the same time, the spectral components between the bias and variance contours give us the flexibility in designing computationally feasible machine learning algorithms. Based on this observation, we develop a multilevel kernel operator learning algorithm that is optimal when learning linear operators between infinite-dimensional function spaces. Link » Jikai Jin · Yiping Lu · Jose Blanchet · Lexing Ying 🔗 - Identifying Witnesses to Noise Transients in Ground-based Gravitational-wave Observations using Auxiliary Channels with Matrix and Tensor Factorization Techniques (Poster)  link » Ground-based gravitational-wave (GW) detectors are a frontier large-scale experiment in experimental astrophysics. Given the elusive nature of GWs, the ground-based detectors have complex interacting systems made up of exquisitely sensitive instruments which makes them susceptible to terrestrial noise sources. As these noise transients - termed as glitches - appear in the detector's main data channel, they can mask or mimic real GW signals resulting in false alarms in the detection pipelines. Given their high rate of occurrence compared to astrophysical signals, it is vital to examine these glitches and probe their origin in the detector's environment and instruments in order to possibly eliminate them from the science data. In this paper we present a tensor factorization-based data mining approach to finding witness events to these glitches in the network of heterogeneous sensors that monitor the detectors and build a catalog which can aid human operators in diagnosing the sources of these noise transients. Link » Rutuja Gurav · Vagelis Papalexakis · Gabriele Vajente · Jonathan Richardson · Barry Barish 🔗 - Data-driven Acceleration of Quantum Optimization and Machine Learning via Koopman Operator Learning (Poster)  link » Efficient optimization methods play a crucial role for quantum optimization and machine learning on near-term quantum computers. Unlike classical computers, obtaining gradients on quantum computers is costly with sample complexity scaling with the number of parameters and measurements. In this paper, we connect the natural gradient method in quantum optimization with Koopman operator theory, which provides a powerful framework for predicting nonlinear dynamics. We propose a data-driven approach for accelerating quantum optimization and machine learning via Koopman operator learning. To predict parameter updates on quantum computers, we develop new methods including the sliding window dynamic mode decomposition (DMD) and the neural-network-based DMD. We apply our methods both on simulations and real quantum hardware. We demonstrate efficient prediction and acceleration of gradient optimization on the variational quantum eigensolver and quantum machine learning. Link » Di Luo · Jiayu Shen · Rumen Dangovski · Marin Soljacic 🔗 - Knowledge-Guided Transfer Learning for Modeling Subsurface Phenomena Under Data Paucity (Poster)  link » Knowledge transfer from machine learning (ML) models, pre-trained on large corpuses has been leveraged effectively in domains like natural language processing and computer vision to improve model generalization. The knowledge transfer prowess of ML and especially deep learning (DL) models has been demonstrated to be especially effective under data paucity of the target task. Many scientific phenomena require the execution of costly simulations to estimate the process of interest. Predicting molecular configuration of fluids confined in porous media is one such problem of extreme relevance in many scientific applications, the study of which requires the execution of expensive Molecular Dynamics (MD) simulations. However, due to the cost of MD, large scale simulations become intractable. Hence, in this work, we develop a novel science-guided deep learning framework NanoNet-SG to emulate MD simulations. Our proposed NanoNet-SG model leverages scientific domain knowledge in conjunction with knowledge from pre-trained knowledge bases for estimating molecular configuration of fluid mixtures. Through rigorous experimentation, we demonstrate that our proposed NanoNet-SG model yields good generalization performance (minimum performance improvement of 16.26 % over baselines) and yields predictions that are consistent with known scientific domain rules despite being trained on a low volume of MD simulation data (i.e., data paucity). Link » Nikhil Muralidhar · NIcholas Lubbers · Mohamed Mehana · Naren Ramakrishnan · Anuj Karpatne 🔗 - Tabular deep learning when $d \gg n$ by using an auxiliary knowledge graph (Poster)  link » Machine learning models exhibit strong performance on datasets with abundant labeled samples. However, for tabular datasets with extremely high $d$-dimensional features but limited $n$ samples (i.e. $d \gg n$), machine learning models struggle to achieve strong performance. Here, our key insight is that even in tabular datasets with limited labeled data, input features often represent real-world entities about which there is abundant prior information which can be structured as an auxiliary knowledge graph (KG). For example, in a tabular medical dataset where every input feature is the amount of a gene in a patient's tumor and the label is the patient's survival, there is an auxiliary knowledge graph connecting gene names with drug, disease, and human anatomy nodes. We therefore propose PLATO, a machine learning model for tabular data with $d \gg n$ and an auxiliary KG with input features as nodes. PLATO uses a modified multilayer perceptron (MLP) to predict the output labels from the tabular data and the auxiliary KG with two components. First, PLATO predicts the parameters in the first layer of the MLP from the auxiliary KG. PLATO thereby reduces the number of trainable parameters in the MLP and integrates auxiliary information about the input features. Second, PLATO predicts different parameters in the first layer of the MLP for every input sample, thereby increasing the MLP’s representational capacity by allowing it to use different prior information for every input sample. Across 10 state-of-the-art baselines and 6 $d \gg n$ datasets, PLATO exceeds or matches the prior state-of-the-art, achieving performance improvements of up to 10.19%. Overall, PLATO uses an auxiliary KG about input features to enable tabular deep learning prediction when $d \gg n$. Link » Camilo Ruiz · Hongyu Ren · Kexin Huang · Jure Leskovec 🔗 - Gauge Equivariant Neural Networks for 2+1D U(1) Gauge Theory Simulations in Hamiltonian Formulation (Poster)  link » Gauge Theory plays a crucial role in many areas in science, including high energy physics, condensed matter physics and quantum information science. In quantum simulations of lattice gauge theory, an important step is to construct a wave function that obeys gauge symmetry. In this paper, we have developed gauge equivariant neural network wave function techniques for simulating continuous-variable quantum lattice gauge theories in the Hamiltonian formulation. We have applied the gauge equivariant neural network approach to find the ground state of 2 + 1-dimensional lattice gauge theory with U (1) gauge group using variational Monte Carlo. We have benchmarked our approach against state-of-the-arts complex Gaussian wave functions, demonstrating improved performance in the strong coupling regime and comparable results in the weak coupling regime. Link » Di Luo · Shunyue Yuan · James Stokes · Bryan Clark 🔗 - PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design (Poster)  link » Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with hierarchical structure. We consider a common use case where some regions of the Pareto frontier are prioritized over others according to a specified $\textit{partial ordering}$ in the objectives. For instance, when designing antibodies, we would like to maximize the binding affinity to a target antigen only if can be expressed in live cell culture---modeling the experimental dependency in which affinity can only be measured for antibodies that can be expressed and thus produced in viable quantities. In general, we may want to confer a partial ordering to the properties such that each property is optimized conditioned on its parent properties satisfying some feasibility condition. To this end, we present PropertyDAG, a framework that operates on top of the traditional multi-objective BO to impose a desired partial ordering on the objectives, e.g. expression $\rightarrow$ affinity. We demonstrate its performance over multiple simulated active learning iterations on a penicillin production task, toy numerical problem, and a real-world antibody design task. Link » Ji Won Park · Samuel Stanton · Saeed Saremi · Andrew Watkins · Stephen Ra · Vladimir Gligorijevic · Kyunghyun Cho · Richard Bonneau 🔗 - Resolving Computational Challenges in Accelerating Electronic Structure Calculations using Machine Learning (Poster)  link » Recent advances in use of machine learned surrogates to accelerate electronic structure calculations provide exciting opportunities for materials modeling. While the new models are extremely effective, the training of such models require millions of samples for predicting the material properties for a configuration of atoms or snapshot in a single temperature, atomic density pair. This results in excessively high training costs when material properties for multiple snapshots at multiple temperatures and densities are needed. We present a novel atom-centered decomposition of local density of states for supervision, which reduces the number of samples for training and evaluation by orders of magnitude compared to past approaches. Combined with a new model for learning atomic environment descriptions end-to-end, our approach allows resolving downstream quantities such as band energy of melting point aluminum at a fraction of the cost of previous state of the art, with matching or greater accuracy. We further demonstrate that the new models generalize across multiple temperatures of Aluminum reducing computational costs even further. Finally, in order to extend the approach even further we devise an uncertainty metric to choose the next snapshot for training. We demonstrate the efficacy of this metric using liquid and solid aluminum snapshots. Link » James S Fox · J. Adam Stephens · Normand Modine · Laura Swiler · Sivasankaran Rajamanickam 🔗 - Conditioned Spatial Downscaling of Climate Variables (Poster)  link » Global Climate Models (GCM) play a vital role in assessing the large-scale impacts of climate change. Downscaling methods can translate coarse-resolution climate information from GCM to high-resolution predictions to forecast regional effects. Unfortunately, current downscaling methods struggle to fully take into account spatial relationships among variables, especially at long distances. In this work, we propose an instance-conditional pixel synthesis generative adversarial network (ICPS-GAN), wherein conditioning on spatial information is an explicit way of providing the GAN with previous high-resolution and current low-resolution data, resulting in an enhancement of the general performance. Experimental results on precipitation forecast for US region data outperform both traditional and other learning-based methods when extrapolating in space. Link » Alex Hung · Evan Becker · Ted Zadouri · Aditya Grover 🔗 - Incorporating Higher Order Constraints for Training Surrogate Models to Solve Inverse Problems (Poster)  link » Inverse problems describe the task of recovering some underlying signal given some observables. Typically, the observables are related via some non-linear forward model applied to the underlying signal. Inverting the non-linear forward model can be computationally expensive, as it involves calculating the adjoint when computing a descent direction. Rather than inverting the non-linear model, we instead train a surrogate forward model and leverage modern auto-grad libraries to solve for SSPs within a classical optimization framework. Current methods to train surrogate models are done in a black box supervised machine learning fashion and don't take advantage of any existing knowledge of the forward model. In this article, we propose a simple regularization method to enforce constraints on the gradients of the surrogate model in addition to the output to improve overall accuracy. We demonstrate the efficacy on an ocean acoustic tomography (OAT) example that aims to recover ocean sound speed profile (SSP) variations from acoustic observations (e.g. eigenray arrival times) within simulation of ocean dynamics in the Gulf of Mexico. Link » Jihui Jin · Nick Durofchalk · Richard Touret · Karim Sabra · Justin Romberg 🔗 - Incremental Fourier Neural Operator (Poster)  link » Recently, neural networks have proven their impressive ability to solve partial differential equations (PDEs). Among them, Fourier neural operator (FNO) has shown success in learning solution operators for highly non-linear problems such as turbulence flow. FNO is discretization-invariant, where it can be trained on low-resolution data and generalizes to problems with high-resolution. This property is related to the low-pass filters in FNO, where only a limited number of frequency modes are selected to propagate information. However, it is still a challenge to select an appropriate number of frequency modes and training resolution for different PDEs. Too few frequency modes and low-resolution data hurt generalization, while too many frequency modes and high-resolution data are computationally expensive and lead to over-fitting. To this end, we propose Incremental Fourier Neural Operator (IFNO), which augments both the frequency modes and data resolution incrementally during training. We show that IFNO achieves better generalization (around 15% reduction on testing L2 loss) while reducing the computational cost by 35%, compared to the standard FNO. In addition, we observe that IFNO follows the behavior of implicit regularization in FNO, which explains its excellent generalization ability. Link » Jiawei Zhao · Robert Joseph George · Yifei Zhang · Zongyi Li · Anima Anandkumar 🔗 - Neural Unbalanced Optimal Transport via Cycle-Consistent Semi-Couplings (Poster)  link » Comparing unpaired samples of a distribution or population taken at different points in time is a fundamental task in many application domains where measuring populations is destructive and cannot be done repeatedly on the same sample, such as in single-cell biology. Optimal transport (OT) can solve this challenge by learning an optimal coupling of samples across distributions from unpaired data. However, the usual formulation of OT assumes conservation of mass, which is violated in unbalanced scenarios in which the population size changes (e.g., cell proliferation or death) between measurements. In this work, we introduce NubOT, a neural unbalanced OT formulation that relies on the formalism of semi-couplings to account for creation and destruction of mass. To estimate such semi-couplings and generalize out-of-sample, we derive an efficient parameterization based on neural OT maps and propose a novel algorithmic scheme through a cycle-consistent training procedure. We apply our method to the challenging task of forecasting heterogeneous responses of multiple cancer cell lines to various drugs, where we observe that by accurately modeling cell proliferation and death our method yields notable improvements over previous neural optimal transport methods. Link » Frederike Lübeck · Charlotte Bunne · Gabriele Gut · Jacobo Sarabia del Castillo · Lucas Pelkmans · David Alvarez-Melis 🔗 - Improving Classification and Data Imputation for Single-Cell Transcriptomics with Graph Neural Networks (Poster)  link » Single-cell RNA sequencing (scRNA-seq) provides vast amounts of gene expression data. In this paper, we benchmark several graph neural network (GNN) approaches for cell-type classification and imputation of missing values on single-cell gene expression. For cell classification, we use a cell-cell graph representation to find greatest performance using a graph convolutional network (GCN) model with a differentiable group normalisation (DGN) layer to alleviate issues of oversmoothing, in conjunction with an adjacency matrix predetermined by spectral clustering. This method marginally outperforms an SVM benchmark model, 59.4\% compared to 58.6\%, on the Paul15 dataset, which describes the development of myeloid progenitors. Performance scales well with the number of gene expressions, and on the PBMC3K dataset describing peripheral blood mononuclear cells with higher a higher number of gene expressions, this method outperforms an SVM benchmark, 95.6\% vs 94.2\%. For data imputation, we model the data as a bipartite graph consisting of cell and gene nodes, with edge values signifying gene expression. We train a 3-layer GraphSage GNN to impute data by training it to reconstruct the dataset based on the downstream task. When applied with this imputation model, GNN classification performance is similar at 58\%, however exhibits better learning and generalisation characteristics. Our findings catalyse the development of new tools to analyse complex single-cell datasets. Link » Han-Bo Li · Ramon Viñas Torné · Pietro Lió 🔗 - FALCON: Fourier Adaptive Learning and Control for Disturbance Rejection Under Extreme Turbulence (Poster)  link » We study the design of stabilizing policies for an airfoil under extreme turbulent flow dynamics. In practice, the standard approach for this task is to reactively correct the deviations from the desired trajectory, e.g. PID control, since learning the model dynamics is usually challenging. The recent model-free reinforcement learning (RL) methods, which also do not require model dynamics, are shown to be promising alternatives to the industry standard controllers. However, these methods typically require a vast amount of samples and lack generalizability to new scenarios, which severely limits their applicability. In this work, by leveraging the domain knowledge that the underlying turbulent flow dynamics are well-modeled in the frequency domain, we propose an efficient model-based RL framework, Fourier Adaptive Learning and CONtrol (FALCON). FALCON cleverly chooses a Fourier basis for learning the underlying system dynamics and deploys a model predictive control (MPC) approach for safe and efficient control design. We show that FALCON quickly learns the fluid dynamics, adapts to the changing flow conditions, and outperforms the state-of-the-art methods while using an order of magnitude fewer samples than the model-free methods. This makes FALCON the first model-based RL method deployed in real-world extreme turbulent environments. Moreover, we derive theoretical learning and performance guarantees for FALCON for a wide range of partially observable nonlinear dynamical systems. Link » Sahin Lale · Peter Renn · Kamyar Azizzadenesheli · Babak Hassibi · Morteza Gharib · Anima Anandkumar 🔗 - Critical Temperature Prediction of Superconductors Based on Machine Learning: A Short Review (Poster)  link » Superconductors have a lot of promising potential applications in power transmission and power magnet development because of their special characteristics. However, new superconductor discovery requires extensive trial-and-error experimentation, which is time-consuming and expensive. The development of machine learning techniques makes it possible for identifying superconductors and predicting their critical temperature from the material's proprieties. This paper gives a short review of machine learning's application in superconductors' critical temperature prediction. Related datasets and different proposed methods are included. And we also discussed the future research directions and opportunities in this field. Link » Juntao Jiang · Renjun Xu 🔗 - Toward Human-AI Co-creation to Accelerate Material Discovery (Poster)  link » There is an increasing need in our society to achieve faster advances in Science to tackle urgent problems, such as climate changes, environmental hazards, sus- tainable water management, sustainable energy systems, pandemics, among others. The urgency of scientific discovery in chemistry carries the extra burden of as- sessing risks of the proposed novel solutions before moving to the experimental stage. Despite several recent advances in Machine Learning and AI to address some of these challenges, there is still a gap in technologies to support end-to-end discovery applications, integrating the myriad of available technologies into a coherent, orchestrated, yet flexible discovery process. Such applications need to handle complex knowledge management at scale, enabling knowledge consumption and production in a timely and productive way for subject matter experts (SMEs). Furthermore, the discovery of novel functional materials strongly relies on the development of exploration strategies in the chemical space. For instance, gener- ative models have gained attention within the scientific community due to their ability to generate enormous volumes of novel molecules across material domains. These models exhibit extreme creativity that often translates in low viability of the generated candidates. In the context of materials discovery, viability is a complex metric evaluated by SMEs from complementary domains, such as synthetic organic chemistry, process scale-up, intellectual property development, regulatory compli- ance, and such. In this scenario, we observe an excellent opportunity to incorporate AI techniques to support SMEs, as well as the need for a platform to exploit the human-AI interaction focusing on reducing the time until the first discovery and the opportunity costs involved. In this work, we propose a workbench framework for the human-AI Co-creation to accelerate material discovery, which has four main components: generative models, dataset triage, molecule adjudication, and risk assessment. Link » Dmitry Zubarev · Carlos Raoni Mendes · Emilio Vital Brazil · Renato Cerqueira · Kristin Schmidt · Vinicius Segura · Juliana Ferreira · Daniel Sanders 🔗 - Robust task-specific adaption of models for drug-target interaction prediction (Poster)  link » HyperNetworks have been established as an effective technique to achieve fast adaptation of parameters for neural networks. Recently, HyperNetworks conditioned on descriptors of tasks have improved multi-task generalization in various domains, such as personalized federated learning and neural architecture search. Especially powerful results were achieved in few- and zero-shot settings, attributed to the increased information sharing by the HyperNetwork. With the rise of new diseases fast discovery of drugs is needed which requires proteo-chemometric models that are able to generalize drug-target interaction predictions in low-data scenarios. State-of-the-art methods apply a few fully-connected layers to concatenated learned embeddings of the protein target and drug compound. In this work, we develop a task-conditioned HyperNetwork approach for the problem of predicting drug-target interactions in drug discovery. We show that when model parameters are predicted for the fully-connected layers processing the drug compound embedding, based on the protein target embedding, predictive performance can be improved over previous methods. Two additional components of our architecture, a) switching to L1 loss, and b) integrating a context module for proteins, further boost performance and robustness. On an established benchmark for proteo-chemometrics models, our architecture outperforms previous methods in all settings, including few- and zero-shot settings. In an ablation study, we analyze the importance of each of the components of our HyperNetwork approach. Link » Emma Svensson · Pieter-Jan Hoedt · Sepp Hochreiter · Günter Klambauer 🔗 - Surrogate modeling of stress fields in periodic polycrystalline microstructures using U-Net and Fourier neural operators (Poster)  link » In this work, we implement and compare two artificial neural networks (ANNs) $\textemdash$ U-Net and Fourier neural operators (FNO) $\textemdash$ for surrogate modeling of stress fields in periodic polycrystalline microstructures. Both ANNs were trained on results from the numerical solution of the boundary-value problem for quasi-static mechanical equilibrium in grain microstructures under uniaxial tensile loading. More specifically, they learned mappings from the spatial fields of material properties to the equilibrium stress fields. To generate multiple output fields, one for every stress component, the networks were branched internally into parallel sub-networks at different stages, which were then trained together. We compare various such adaptations to find the best one. For the U-Net-based approach, we show that convolution with periodic padding instead of zero padding gives better accuracy along the system boundaries. We further compare the predictions from the two approaches: the FNO-based approach is more accurate than its U-Net-based counterpart; the normalized mean absolute error incurred on the predicted stress field with respect to the numerical solution is $3.5-7.5$ times lower for the former than the latter. In comparison to the U-Net-based approach, the errors in the FNO-based approach are restricted to grain boundaries leading to narrower error distribution. Link » Sarthak Kapoor · Jaber Mianroodi · Bob Svendsen · Mohammad Khorrami · Nima Siboni 🔗 - Deep Learning for Reference-Free Geolocation of Poplar Trees (Poster)  link » A core task in precision agriculture is the identification of climatic and ecological conditions that are advantageous for a given crop. The most succinct approach is geolocation, which is concerned with locating the native region of a given sample based on its genetic makeup. Here, we investigate genomic geolocation of Populus trichocarpa, or poplar, which has been identified by the US Department of Energy as a fast-rotation biofuel crop to be harvested nationwide. In particular, we approach geolocation from a reference-free perspective, circumventing the need for compute-intensive processes such as variant calling and alignment. Our model, MashNet, predicts latitude and longitude for poplar trees from randomly-sampled, unaligned sequence fragments. We show that our model performs comparably to Locator, a state-of-the-art method based on aligned whole-genome sequence data. MashNet achieves an error of 34.0 km$^2$ compared to Locator's 22.1 km$^2$. MashNet allows growers to quickly and efficiently identify natural varieties that will be most productive in their growth environment based on genotype. This paper explores geolocation for precision agriculture while providing a framework and data source for further development by the machine learning community. Link » Cai John · Owen Queen · Scott Emrich · Wellington Muchero 🔗 - Solar Flare Forecasting with Data-driven Interpretable Model (Poster)  link » Solar flares are the most violent activities in the solar system, which are caused by the evolution of magnetic field in solar active regions. However, the mechanism which triggers solar flares is still an active research area and many algorithms based on different models are proposed to forecast solar flares. In this paper, we propose a novel data-driven method to forecast solar flares, which is built with convolutional neural network and long short term memory neural network. Our method could precept continuous magnetic field observation data with 6 hours long and predict the probability of flares of different classes in the next 24 hours with a Bayesian neural network. Comparing with traditional method, our method could not only forecast solar flares with high precision rate and low false alarm rate, but also highlight the region which would trigger solar flares with the class activation mapping (CAM). The inception obtained by the CAM could help scientists to dig deeper into physical mechanism which triggers solar flares. We use our method to process real observation data. Results show that our model mainly focuses on the region with strong magnetic field, the polarity reversal line and the magnetic field conversion area, which is consistent to theoretical predictions. Link » JiaMeng Lv · Peng Jia · 陈风 · 杨过 · Tie Liu 🔗 - So ManyFolds, So Little Time: Efficient Protein Structure Prediction with pLMs and MSAs (Poster)  link » In recent years, machine learning approaches for de novo protein structure prediction have made significant progress, culminating in AlphaFold which approaches experimental accuracies in certain settings and heralds the possibility of rapid in silico protein modelling and design. However, such applications can be challenging in practice due to the significant compute required for training and inference of such models, and their strong reliance on the evolutionary information contained in multiple sequence alignments (MSAs), which may not be available for certain targets of interest. Here, we first present a streamlined AlphaFold architecture and training pipeline that still provides good performance with significantly reduced computational burden. Aligned with recent approaches such as OmegaFold and ESMFold, our model is initially trained to predict structure from sequences alone by leveraging embeddings from the pretrained ESM-2 protein language model (pLM). We then compare this approach to an equivalent model trained on MSA-profile information only, and find that the latter still provides a performance boost - suggesting that even state-of-the-art pLMs cannot yet easily replace the evolutionary information of homologous sequences. Finally, we train a model that can make predictions from either the combination, or only one, of pLM and MSA inputs. Ultimately, we obtain accuracies in any of these three input modes similar to models trained uniquely in that setting, whilst also demonstrating that these modalities are complimentary, each regularly outperforming the other. Link » Thomas D Barrett · Amelia Villegas-Morcillo · Louis Robinson · Benoit Gaujac · David Admète · Elia Saquand · Karim Beguir · Arthur Flajolet 🔗 - An "interpretable-by-design" neural network to decipher RNA splicing regulatory logic (Poster)  link » Artificial intelligence algorithms, in particular neural networks, capture complex quantitative relationships between input and output. However, as neural networks are typically black box, it is difficult to extract post-hoc insights on how they achieve their predictive success. Furthermore, they easily capture artifacts or biases in the training data, often fail to generalize beyond the datasets used for training and testing, and do not lead to new insights on the underlying processes. To enable scientific progress, machine learning models should not only accurately predict outcomes, but also describe how they arrived at their predictions. In recent years, neural networks have been applied to understanding biological processes, and specifically in deciphering RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Despite recent success using neural networks to predict splicing outcomes, understanding how specific RNA features dictates splicing outcomes remains an open challenge. The challenge is further underscored by the sensitivity of splicing logic, where almost all single nucleotide changes along an exon can lead to dramatic changes in splicing outcomes. Here we demonstrate that an "interpretable-by-design" model achieves predictive accuracy without sacrificing interpretability and captures a unifying decision-making logic. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. Importantly, the model revealed novel components of splicing logic, which we experimentally validated. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. The network's ability to quantify contributions of specific features to splicing outcomes for individual exons has considerable potential for a range of medical and biotechnology applications, including genome- or RNA-editing of target exons to correct splicing behavior or guiding rational design of RNA-based therapeutics like antisense oligonucleotides. Link » Susan Liao · Mukund Sudarshan · Oded Regev 🔗 - Physics-informed inference of animal movements from weather radar data (Poster)  link » Studying animal movements is essential for effective wildlife conservation and conflict mitigation. For aerial movements, operational weather radars have become an indispensable data source in this respect. However, partial measurements, incomplete spatial coverage, and poor understanding of animal behaviours make it difficult to reconstruct complete spatio-temporal movement patterns from available radar data. We tackle this inverse problem by learning a mapping from high-dimensional radar measurements to low-dimensional latent representations using a convolutional encoder. Under the assumption that the latent system dynamics are well approximated by a locally linear Gaussian transition model, we perform efficient posterior estimation using the classical Kalman smoother. A convolutional decoder maps the inferred latent system states back to the physical space in which the known radar observation model can be applied, enabling fully unsupervised training. To encourage scientific consistency, we additionally introduce a physics-informed loss term that leverages known mass conservation constraints. Our experiments on synthetic radar data show promising results in terms of reconstruction quality and data-efficiency. Link » Fiona Lippert · Patrick Forré 🔗 - SRSD: Rethinking Datasets of Symbolic Regression for Scientific Discovery (Poster)  link » Symbolic Regression (SR) is a task of recovering mathematical expressions from given data and has been attracting attention from the research community to discuss its potential for scientific discovery. However, the community lacks datasets of symbolic regression for scientific discovery (SRSD) to discuss the potential of SR. To address the critical issue, we revisit datasets of SRSD to discuss the potential of symbolic regression for scientific discovery. Focused on a set of formulas used in the existing datasets based on Feynman Lectures on Physics, we recreate 120 datasets to discuss the performance of SRSD. For each of the 120 SRSD datasets, we carefully review the properties of the formula and its variables to design reasonably realistic sampling ranges of values so that our new SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method can (re)discover physical laws from such datasets. We conduct experiments on our new SRSD datasets using five state-of-the-art SR methods in SRBench, and the results show that the new SRSD datasets are more challenging than the original ones. We will share our datasets and code repository upon acceptance. Link » Yoshitomo Matsubara · Naoya Chiba · Ryo Igarashi · Yoshitaka Ushiku 🔗 - Diffusion-based Molecule Generation with Informative Prior Bridges (Poster)  link » AI-based molecule generation provides a promising approach to a large area of biomedical sciences and engineering, such as antibody design, hydrolase engineering, or vaccine development. Because the molecules are governed by physical laws, a key challenge is to incorporate prior information into the training procedure to generate high-quality and realistic molecules. We propose a simple and novel approach to steer the training of diffusion-based generative models with physical and statistics prior information. This is achieved by constructing physically informed diffusion bridges, stochastic processes that guarantee to yield a given observation at the fixed terminal time. We develop a Lyapunov function based method to construct and determine bridges, and propose a number of proposals of informative prior bridges for high-quality molecule generation. With comprehensive experiments, we show that our method provides a powerful approach to the 3D generation task, yielding molecule structures with better quality and stability scores. Link » Chengyue Gong · Lemeng Wu · Xingchao Liu · Mao Ye · Qiang Liu 🔗 - Multiresolution Mesh Networks For Learning Dynamical Fluid Simulations (Poster)  link » In this paper, we introduce Multiresolution Mesh Networks-enhanced MeshGraphNets (MGN-MeshGraphNet) for learning mesh-based dynamical fluid simulations. The novelty of our proposal comes from the ability to capture multiscale structures of fluid dynamics via a learnable coarse-graining mechanism on meshes (i.e. mesh multiresolution), along with long-range dependencies between multiple timesteps and resolutions for robust prediction. Our proposed method has shown competitive numerical results in comparison with other machine learning approaches based on graph neural networks. Given the flexibility of our data-driven approach for building mesh multiresolution, our method has better generalizability for new fluid dynamical simulations outside of the training data while attaining high accuracies on multiple resolutions and computational speedup compared to the existing PDE numerical solvers of Navier--Stokes equation. Link » Bach Nguyen · Truong Son Hy · Long Tran-Thanh · Risi Kondor 🔗 - Li-ion Battery Material phase prediction through Hierarchical Curriculum Learning (Poster)  link » Li-ion Batteries (LIB), one of the most efficient energy storage devices, are widely adopted in many industrial applications. Imaging data of these battery electrodes obtained from X-ray tomography can explain the distribution of material constituents and allow reconstructions to study electron transport pathways. Therefore, it can eventually help quantify various associated properties of electrodes (e.g., volume-specific surface area, porosity) which determine the performance of batteries. However, these images often suffer from low image contrast between multiple material constituents , making it difficult for humans to distinguish and characterize these constituents through visualization. A minor error in detecting distributions among the material constituents can lead to a high error in the calculated parameters of material properties.We present a novel hierarchical curriculum learning framework to address the complex task of estimating material constituent distribution in battery electrodes. To provide spatially smooth prediction, our framework comprises three modules: (i) an uncertainty-aware model trained to yield inferences conditioned upon global knowledge of material distribution, (ii) a technique to capture relatively more fine-grained (local) distributional signals, (iii) an aggregator to appropriately fuse the local and global effects towards obtaining the final distribution. Link » Anika Tabassum · Nikhil Muralidhar · Ramakrishnan Kannan · Srikanth Allu 🔗 - Interpretable Geometric Deep Learning via Learnable Randomness Injection (Poster)  link » Point cloud data is ubiquitous in scientific fields. Recently, geometric deep learning (GDL) has been widely applied to solve prediction tasks with such data. However, GDL models are often complicated and hardly interpretable, which poses concerns to scientists when deploying these models in scientific analysis and experiments. This work proposes a general mechanism named learnable randomness injection (LRI), which allows building inherently interpretable models based on general GDL backbones. Once being trained, LRI-induced models can detect the points in the point cloud data that carry information indicative of the prediction label. Such indicative information may be reflected by either the existence of these points in the data or the geometric locations of these points. We also propose four datasets from real scientific applications in the domains of high energy physics and biochemistry to evaluate LRI. Compared with previous post-hoc interpretation methods, the points detected by LRI align much better and stabler with the ground-truth patterns that have actual scientific meanings. LRI-induced models are also more robust to the distribution shifts between training and test scenarios. Link » Siqi Miao · Yunan Luo · Mia Liu · Pan Li 🔗 - Fourier Continuation for Exact Derivative Computation in Physics-Informed Neural Operators (Poster)  link » The physics-informed neural operator (PINO) is a machine learning architecture that has shown promising empirical results for learning partial differential equations. PINO uses the Fourier neural operator (FNO) architecture to overcome the optimization challenges often faced by physics-informed neural networks. Since the convolution operator in PINO uses the Fourier series representation, its gradient can be computed exactly on the Fourier space. While Fourier series cannot represent nonperiodic functions, PINO and FNO still have the expressivity to learn nonperiodic problem with Fourier extension via padding. However, computing the Fourier extension in the physics-informed optimization requires solving an ill-conditioned system, resulting in inaccurate derivatives which prevents effective optimization. In this work, we present an architecture that leverages Fourier continuation (FC) to apply the exact gradient method to PINO for nonperiodic problems. This paper investigates three different ways that FC can be incorporated into PINO by testing their performance on a 1D blowup problem. Experiments show that FC-PINO outperforms padded PINO, improving equation loss by several orders of magnitude, and it can accurately capture the third order derivatives of nonsmooth solution functions. Link » Haydn Maust · Zongyi Li · Yixuan Wang · Anima Anandkumar 🔗 - HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing (Poster)  link » The molecular basis of protein thermal stability is only partially understood and has major significance for drug and vaccine discovery. The lack of datasets and standardized benchmarks considerably limits learning-based discovery methods. We present $\texttt{HotProtein}$, a large-scale protein dataset with \textit{growth temperature} annotations of thermostability, containing $182$K amino acid sequences and $3$K folded structures from $230$ different species with a wide temperature range $-20^{\circ}\texttt{C}\sim 120^{\circ}\texttt{C}$. Due to functional domain differences and data scarcity within each species, existing methods fail to generalize well on our dataset. We address this problem through a novel learning framework, consisting of ($1$) Protein structure-aware pre-training (SAP) which leverages 3D information to enhance sequence-based pre-training; ($2$) Factorized sparse tuning (FST) that utilizes low-rank and sparse priors as an implicit regularization, together with feature augmentations. Extensive empirical studies demonstrate that our framework improves thermostability prediction compared to other deep learning models. Finally, we propose a novel editing algorithm to efficiently generate positive amino acid mutations that improve thermostability. Link » Tianlong Chen · Chengyue Gong · Daniel Diaz · Xuxi Chen · Jordan Wells · Qiang Liu · Zhangyang Wang · Andrew Ellington · Alex Dimakis · Adam Klivans 🔗 - Generating counterfactual explanations of tumor spatial proteomes to discover effective, combinatorial therapies that enhance cancer immunotherapy (Poster)  link » Recent advances in tissue imaging technologies have led to the generation of massive datasets of spatial profiles of human tissues, taken at micron scale resolution, spanning hundreds of patients, and across tens to thousands of molecular biomarkers. These high-dimensional imaging data necessitate the development of new, AI-based tools to uncover new biology and to support therapeutic developments against diseases. Currently, a major need in the treatment of solid tumor cancers are strategies that can drive the infiltration of T cells into the tumor. In this study, we developed an optimization strategy combining supervised ML and counterfactual explanations to discover clinically feasible tumor perturbations that drive T cell infiltration, and applied our framework to spatial proteomes of breast cancer and melanoma tissue. Our model predicts that altering the levels of four molecules (CCL4, CXCL12, CXCL13, CCL8) in immune-excluded melanoma tissues can increase T cell infiltration by 10 fold, across an entire cohort of 69 patients. Our work provides a paradigm for machine learning based prediction and design of cancer therapeutics based on classification of immune system activity in spatial proteomics data. Link » Jerry Wang · Matt Thomson 🔗 - Proposal of a topology-aware method to segment 3D plant tissues images. (Poster)  link » The study of genetic and molecular mechanisms underlying tissue morphogenesishas received a lot of attention in biology. Especially, accurate segmentation oftissues into individual cells plays an important role for quantitative analyzing thedevelopment of the growing organs. However, instance cell segmentation is stilla challenging task due to the quality of the image and the fine-scale structure.Any small leakage in the boundary prediction can merge different cells together,thereby damaging the global structure of the image. In this paper, we propose anend-to-end topology-aware 3D segmentation method for plant tissues. The strengthof the method is that it takes care of the 3D topology of segmented structures. Our method relies on a common deep neural network. The keystone is a trainingphase and a new topology-aware loss - the CavityLoss - that are able to help thenetwork to focus on the topological errors to fix them during the learning phase.The evaluation of our method on both fixed and live plant organ datasets shows thatour method outperforms state-of-the-art methods (and contrary to state-of-the-artmethods, does not require any post-processing stage). The code of CavityLoss isfreely available at https://xxxxxxxxxxxxxxxxxxxxxxxxx. Link » Minh On · Nicolas Boutry · Jonathan Fabrizio 🔗 - Structure-Aware Antibiotic Resistance Classification using Graph Neural Networks (Poster)  link » Antibiotics are traditionally used to treat bacterial infections. However, bacteria can develop immunity to drugs, making them ineffective and thus posing a serious threat to global health. Identifying and classifying the genes responsible for this resistance is critical for the prevention, diagnosis, and treatment of infections as well as the understanding of its mechanisms. Previous methods developed for this purpose have mostly been sequence-based, relying on comparisons to existing databases or machine learning models trained on sequence features. However, genes with comparable functions may not always have similar sequences. As a result, in this paper, we develop a deep learning model that uses the protein structure as a complement to the sequence to classify novel ARGs (antibiotic resistant genes), which we expect to provide more useful information than the sequence alone. The proposed approach consists of two steps. First, we capitalize on the celebrated AlphaFold model to predict the 3D structure of a protein from its amino acid sequence. Then, we process the sequence using a transformers-based language model while we also apply a graph neural network to the graph extracted from the structure. We evaluate the proposed architecture on a standard benchmark dataset where it outperforms state-of-the-art methods. Link » Aymen Qabel · Sofiane ENNADIR · Giannis Nikolentzos · Johannes Lutzeyer · Michail Chatzianastasis · Henrik Boström · Michalis Vazirgiannis 🔗 - MoleculeCLIP: Learning Transferable Molecule Multi-Modality Models via Natural Language (Poster)  link » Recently, artificial intelligence for drug discovery has attracted an increasing interest in the community. One of the key challenges is to learn a powerful molecule representation. To achieve this goal, existing works focus on learning the molecule representations from the molecule chemical structures (\textit{i.e.}, 1D description, 2D topology, or 3D geometry). However, such representations poorly generalize to unseen tasks. Meanwhile, humans can learn the hierarchical and multi-modality information including molecule chemical structure and natural language (\textit{e.g.}, biomedical text) simultaneously and can generalize to new concepts. Motivated by this observation, in this paper, we explore the functionality of text utilization for drug discovery. We design a multi-modality model, MoleculeCLIP, by leveraging natural language and molecule structure. MoleculeCLIP consists of two branches: chemical structure branch to encode the chemical structures and textual description branch to encode corresponding natural language-based descriptions. To train it, we first collect a large-scale dataset with more than 280k text and molecule pairs, called PubChemCLIP. It is about 28$\times$ larger than the existing dataset. We then train our model on this dataset by using the contrastive learning strategy to bridge representations from the two branches. We carefully design two categories of zero-shot downstream tasks: the retrieval task and language-guided editing task, through which we highlight three key features of introducing language in MoleculeCLIP: the open vocabulary, the compositionality, and the domain knowledge exploration. By conducting extensive experiments, quantitatively, MoleculeCLIP outperforms the existing methods on 6 zero-shot retrieval tasks and 24 zero-shot language-guided molecule editing tasks. Qualitatively, we show that MoleculeCLIP can understand the domain information by successfully detecting the key structures referred in the text prompts. Furthermore, the representation learned from MoleculeCLIP can be used to further boost the performance of the existing task, molecular property prediction. Link » Shengchao Liu · Weili Nie · Chengpeng Wang · Jiarui Lu · Zhuoran Qiao · Ling Liu · Jian Tang · Anima Anandkumar · Chaowei Xiao 🔗 - A 3D-Shape Similarity-based Contrastive Approach to Molecular Representation Learning (Poster)  link » Molecular shape and geometry dictate key biophysical recognition processes, yet many modern graph neural networks disregard 3D information for molecular property prediction. Here, we propose a new contrastive-learning procedure for graph neural networks, Molecular Contrastive Learning from Shape Similarity (MolCLaSS), that implicitly learns a three-dimensional representation. Rather than directly encoding or targeting three-dimensional poses, MolCLaSS matches a similarity objective based on Gaussian overlays to learn a meaningful representation of molecular shape. We demonstrate how this framework naturally captures key aspects of three-dimensionality that two-dimensional representations cannot and provides an inductive framework for scaffold hopping. Link » Austin Atsango · Nathaniel Diamant · Ziqing Lu · Tommaso Biancalani · Gabriele Scalia · Kangway Chuang 🔗 - Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations (Poster)  link » Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. Increasingly, machine learning (ML) force field (FF) models begin to replace ab-initio simulations by predicting forces directly from atomic structures. Despite significant progress in this area, such techniques are primarily benchmarked by their force/energy prediction errors, even though the practical use case would be to produce realistic MD trajectories. We aim to fill this gap by introducing a novel benchmark suite for ML MD simulation. We curate representative MD systems, including water, organic molecules, peptide, and materials, and design evaluation metrics corresponding to the scientific objectives of respective systems. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics. We demonstrate when and how selected SOTA methods fail, along with offering directions for further improvement. Specifically, we identify stability as a key metric for ML models to improve. Our benchmark suite comes with a comprehensive open source codebase for training and simulation with ML FFs to facilitate further work. Link » Xiang Fu · Zhenghao Wu · Wujie Wang · Tian Xie · Sinan Keten · Rafael Gomez-Bombarelli · Tommi Jaakkola 🔗 - Predicting Drug-Drug Interactions using Deep Generative Models on Graphs (Poster)  link » Latent representations of drugs and their targets produced by contemporary graph autoencoder-based models have proved useful in predicting many types of node-pair interactions on large networks, including drug-drug, drug-target, and target-target interactions. However, most existing approaches model the node's latent spaces in which node distributions are rigid and disjoint; these limitations hinder the methods from generating new links among pairs of nodes. In this paper, we present the effectiveness of variational graph autoencoders (VGAE) in modeling latent node representations on multimodal networks. Our approach can produce flexible latent spaces for each node type of the multimodal graph; the embeddings are used later for predicting links among node pairs under different edge types. To further enhance the models' performance, we suggest a new method that concatenates Morgan fingerprints, which capture the molecular structures of each drug, with their latent embeddings before preceding them to the decoding stage for link prediction. Our proposed model shows competitive results on two multimodal networks: (1) a multi-graph consisting of drug and protein nodes, and (2) a multi-graph consisting of drug and cell line nodes. Link » Khang Ngo · Truong Son Hy · Risi Kondor 🔗 - Spatio-Temporal Weathering Predictions in the Sparse Data Regime with Gaussian Processes (Poster)  link » We investigate the problem of predicting the expected lifetime of a material in different climatic conditions from a few observations in sparse testing facilities. We propose a Spatio-Temporal adaptation of Gaussian Process Regression that takes full advantage of high-quality satellite data by performing an interpolation directly in the space of climatological time-series. We illustrate our approach by predicting gloss retention of industrial paint formulations. Furthermore, our model provides uncertainty that can guide decision-making and is applicable to a wide range of problems. Link » Giovanni De Felice · Vladimir Gusev · John Goulermas · Michael Gaultois · Matthew Rosseinsky · Catherine Gauvin 🔗 - An AI-Assisted Labeling Tool for Cataloging High-Resolution Images of Galaxies (Poster)  link » The Hubble Space Telescope (HST), the recently launched James Web Space Telescope (JWST), and many earth-based observatories collect data allowing astronomers to answer fundamental questions about the Universe. In this work we focus on an ecosystem of AI tools for cataloging bright sources within galaxies, and use them to analyze young star clusters -- groups of stars held together by their gravitational fields. Their age, mass among other properties provide insights into the process of star formation and the birth and evolution of galaxies. Significant domain expertise and resources are required to discriminate star clusters among tens of thousands of sources that may be extracted for each galaxy. To accelerate this step we propose: 1) a web-based annotation tool to label and visualize high-resolution astronomy data, encouraging efficient labeling and consensus building; and 2) techniques to reduce the annotation cost by leveraging recent advances in unsupervised representation learning on images. We present case studies where we work with astronomy researchers to validate the annotation tool and find that the proposed tools can reduce the annotation effort by 3$\times$ on existing HST catalogs, while facilitating accelerated analysis of new data. Link » Gustavo Perez · Sean Linden · Timothy McQuaid · Matteo Messa · Daniela Calzetti · Subhransu Maji 🔗 - Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction (Poster)  link » Retrosynthesis is the task of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found. Consequently, the goal is to provide a valid synthesis route for a molecule. As more single-step models develop, we see increasing accuracy in the prediction of molecular disconnections, potentially improving the creation of synthetic paths. Multi-step approaches repeatedly apply the chemical information stored in single-step retrosynthesis models. However, this connection is not reflected in contemporary research, fixing either the single-step model or the multi-step algorithm in the process. In this work, we establish a bridge between both tasks by benchmarking the performance and transfer of different single-step retrosynthesis models to the multi-step domain by leveraging two common search algorithms, Monte Carlo Tree Search and Retro*. We show that models designed for single-step retrosynthesis, when extended to multi-step, can have an impressive impact on the route finding capabilities of current multi-step methods, improving performance by up to +30% compared to the most widely used model. Furthermore, we observe no clear link between contemporary single-step and multi-step evaluation metrics, showing that single-step models need to be developed and tested for the multi-step domain and not as an isolated task to find synthesis routes for molecules of interest. Link » Alan Kai Hassen · Paula Torren-Peraire · Samuel Genheden · Jonas Verhoeven · Mike Preuss · Igor Tetko 🔗 - Zero or Infinite Data? Knowledge Synchronized Machine Learning Emulation (Poster)  link » Even when the mathematical model is known in many applications in computational science and engineering, uncertainties are unavoidable. They are caused by initial conditions, boundary conditions, and so on. As a result, repeated evaluations of a costly model governed by partial differential equations (PDEs) are required, making the computation prohibitively expensive. Recently, neural networks have been used as fast alternatives for propagating and quantifying uncertainties. Notably, a large amount of high-quality training data is required to train a reliable neural networks-based emulator. Such ground truth data is frequently gathered in advance by running the numerical solvers that these neural emulators are intended to replace. But, if the underlying PDEs' form is available, do we really need training data? In this paper, we present a principled training framework derived from rigorous and trustworthy scientific simulation schemes. Unlike traditional neural emulator approaches, the proposed emulator does not necessitate the use of a classical numerical solver to collect training data. Rather than emulating dynamics directly, it emulates how a specific numerical solver solves PDEs. The numerical case study demonstrates that the proposed emulator performed well in a variety of testing scenarios. Link » Xihaier Luo · Wei Xu · Yihui Ren · Shinjae Yoo · Balu Nadiga · Ahsan Kareem 🔗 - Representation Learning to Effectively Integrate and Interpret Omics Data (Poster)  link » The last decade has seen an increase in the amount of high throughput data available to researchers. While this has allowed scientists to explore various hypotheses and research questions, it has also highlighted the importance of data integration in order to facilitate knowledge extraction and discovery. Although many strategies have been developed over the last few years, integrating data whilst generating an interpretable embedding still remains challenging due to difficulty in regularisation, especially when using deep generative models. As using one data type only provides a partial view to the condition of interest, we suggest a synergistic approach between different omics data types to infer knowledge and better stratify patients. We introduce a framework called Regularised Multi-View Variational Autoencoder (RMV-VAE) to integrate different omics data types whilst allowing researchers to obtain more biologically meaningful embeddings. Link » Sara Masarone 🔗 - Loop Unrolled Shallow Equilibrium Regularizer (LUSER) - A Memory-Efficient Inverse Problem Solver (Poster)  link » In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. Classical optimization-based techniques proceed by optimizing a data consistency metric together with a regularizer. Current state-of-the-art machine learning approaches draw inspiration from such techniques by unrolling the iterative updates for an optimization-based solver and then learning a regularizer from data. This \emph{loop unrolling} (LU) method has shown tremendous success, but often requires a deep model for the best performance leading to high memory costs during training. Thus, to address the balance between computation cost and network expressiveness, we propose an LU algorithm with shallow equilibrium regularizers (LUSER). These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training. The proposed method is evaluated on image deblurring, computed tomography (CT), as well as single-coil Magnetic Resonance Imaging (MRI) tasks and shows similar, or even better, performance while requiring up to $8 \times$ less computational resources during training when compared against a more typical LU architecture with feedforward convolutional regularizers. Link » Peimeng Guan · Jihui Jin · Justin Romberg · Mark Davenport 🔗 - Re-Evaluating Chemical Synthesis Planning Algorithms (Poster)  link » Computer-Aided Chemical Synthesis Planning (CASP) algorithms have the potential to help chemists predict how to make molecules, and decide which molecules to prioritize for synthesis and testing. Recently, several algorithms have been proposed to tackle this problem, reporting large performance improvements. In this work, we re-examine current and prior State-of-the-Art synthesis planning algorithms under controlled and identical conditions, providing a holistic view using several previously un-reported evaluation metrics which cover the common use-cases of these algorithms. In contrast to prior studies, we find that under strict control, differences between algorithms are smaller than previously assumed. Our findings can guide users to choose the appropriate algorithms for specific tasks, as well as stimulate new research in improved algorithms. Link » Austin Tripp · Krzysztof Maziarz · Sarah Lewis · Guoqing Liu · Marwin Segler 🔗 - Thoughts on the Applicability of Machine Learning to Scientific Discovery and Possible Future Research Directions (Perspective) (Poster)  link » This is a short perspective paper discussing the potential use of machine learning for scientific discovery. Optimizing and streamlining the development of scientific knowledge is a critical issue for the future of humanity. In recent years, machine learning have begun to accelerate scientific progress. However, many of them have been about automation specific to individual scientific domains. In this paper, we emphasize the importance of discussing how to apply machine learning to more general scientific processes as well. We then briefly discuss some possible future research directions to automate scientific discovery by machine learning. Link » Shiro Takagi 🔗 - Substructure-Atom Cross Attention for Molecular Representation Learning (Poster)  link » Designing a neural network architecture for molecular representation is crucial for AI-driven drug discovery and molecule design. In this work, we propose a new framework for molecular representation learning. Our contribution is threefold: (a) demonstrating the usefulness of incorporating substructures to node-wise features from molecules, (b) designing two branch networks consisting of a transformer and a graph neural network so that the networks fused with asymmetric attention, and (c) not requiring heuristic features and computationally-expensive information from molecules. Using 1.8 million molecules collected from ChEMBL and PubChem database, we pretrain our network to learn a general representation of molecules with minimal supervision. The experimental results show that our pretrained network achieves competitive performance on 11 downstream tasks for molecular property prediction. Link » Jiye Kim · Seungbeom Lee · Dongwoo Kim · Sungsoo Ahn · Jaesik Park 🔗 - Interdisciplinary Discovery of Nanomaterials Based on Convolutional Neural Networks (Poster)  link » The material science literature contains up-to-date and comprehensive scientific knowledge of materials. However, their content is unstructured and diverse, resulting in a significant gap in providing sufficient information for material design and synthesis. To this end, we used natural language processing (NLP) and computer vision (CV) techniques based on convolutional neural networks (CNN) to discover valuable experimental-based information about nanomaterials information and synthesis methods in energy-material-related publications. Our first system, TextMaster, extracts opinions from texts and classifies them into challenges and opportunities, achieving 94% and 92% accuracy, respectively. Our second system, GraphMaster, realizes data extraction of tables and figures from publications with 98.3% classification accuracy and 4.3% data extraction mean square error on average. Our results show that these systems could assess the suitability of materials for a certain application by evaluation of real synthesis insights and case analysis with detailed references. This work offers a fresh perspective on mining and unifying knowledge from scientific literature, providing a wide swatch to accelerate nanomaterial research through CNN. Link » Tong Xie · Yuwei Wan · Weijian Li · Qingyuan Linghu · Shaozhou Wang · Yalun Cai · Chunyu Kit · Han Liu · Clara Grazian · Bram Hoex 🔗 - Using Sum-Product Networks to estimate neural population stutcture in the brain (Poster)  link » We present a computationally efficient framework to model a wide range of population structures with high order correlations and a large number of neurons. Our method is based on a special type of Bayesian network that has linear inference time and is founded upon the concept of contextual independence. Moreover, we use an efficient architecture learning method for network selection to model large neural populations even with a small amount of data. Our framework is both fast and accurate in approximating neural population structures. Furthermore, our approach enables us to reliably quantify higher order neural correlations. We test our method on publicly available large-scale neural recordings from the Allen Brain Observatory. Our approach significantly outperforms other models both in terms of statistical measures and alignment with experimental evidence. Link » Koosha Khalvati · Samantha Johnson · Stefan Mihalas · Michael Buice 🔗 - Chemistry Insights for Large Pretrained GNNs (Poster)  link » There have been many recent advances in leveraging machine learning for chemistry applications. One particular task of interest is using graph neural networks (GNNs) on the Open Catalyst 2020 (OC20) dataset to predict the forces and energies of atoms and systems. While large GNNs have shown good progress in this area, we have little understanding of how or why these models work. In an attempt to gain a better understanding and increase our confidence that the models learn meaningful concepts that align with chemical intuition, we present perturbation analyses of GNN predictions on OC20, where we performed small changes on individual atoms and compared the model predictions before and after the changes. We provide visualizations of individual systems as well as analyses on general trends. We observed evidence that aligns with chemical intuition, including the importance of adsorbate atoms on the overall system, that modifying atomic numbers to neighbors of the same row of the periodic table causes less difference than other elemental changes, and a positive correlation between force magnitudes and energy changes. Link » Janice Lan · Katherine Xu 🔗 - Deconvolution of Astronomical Images with Deep Neural Networks (Poster)  link » Optical astronomical images are strongly affected by the point spread function (PSF) of the optical system and the atmosphere (seeing) which blurs the observed image. The amount of blurring depends on both the observed band, and more crucially, on the atmospheric conditions during observation. A typical astronomical image will therefore have a unique PSF that is non-circular and different in different bands. Observations of known stars give us an estimation of this PSF. Any serious candidate for production analysis of astronomical images must take the known PSF into account during image analysis. So far the majority of applications of neural networks (NN) to astronomical image analysis have ignored this problem by assuming a fixed PSF in training and validation. We present a neural network architecture based on Deep Wiener Deconvolution Network (DWDN) that takes the PSF shape into account when performing deconvolution, a possible approach of leveraging PSF information in neural networks. We study the performance of this algorithm under realistic observational conditions. We employ two regularization schemes and study custom loss functions that are optimized for quantities of interest to astronomers. We show that our algorithm can successfully recover unbiased image properties such as colors, ellipticities and orientations for sufficiently high signal-to-noise. This study represents a comprehensive application of AI in astronomy, where the experimental design, model construction, optimization criteria, error estimation and metrics of benchmarks are all meticulously tailored to the domain problem. Link » Hong Wang · Sreevarsha Sreejith · Yuewei Lin · Nesar Ramachandra · Anže Slosar · Shinjae Yoo 🔗 - Adaptive Bias Correction for Improved Subseasonal Forecast (Poster)  link » Subseasonal forecasting—predicting temperature and precipitation 2 to 6 weeks ahead—is critical for effective water allocation, wildfire management, and drought and flood mitigation. Recent international research efforts have advanced the subseasonal capabilities of operational dynamical models, yet temperature and precipitation prediction skills remains poor, partly due to stubborn errors in representing atmospheric dynamics and physics inside dynamical models. To counter these errors, we introduce an adaptive bias correction (ABC) method that combines state-of-the-art dynamical forecasts with observations using machine learning. When applied to the leading subseasonal model from the European Centre for Medium-Range Weather Forecasts (ECMWF), ABC improves temperature forecasting skill by 60-90% and precipitation forecasting skill by 40-69% in the contiguous U.S. We couple these performance improvements with a practical workflow, based on Cohort Shapley, for explaining ABC skill gains and identifying higher-skill windows of opportunity based on specific climate conditions. Link » Soukayna Mouatadid · Paulo Orenstein · Genevieve Flaspohler · Judah Cohen · Miruna Oprescu · Ernest Fraenkel · Lester Mackey 🔗 - Learning Controllable Adaptive Simulation for Multi-scale Physics (Poster)  link » Simulating the time evolution of physical systems is pivotal in many scientific and engineering problems. An open challenge in simulating such systems is their multi-scale dynamics: a small fraction of the system is extremely dynamic, and requires very fine-grained resolution, while a majority of the system is changing slowly and can be modeled by coarser spatial scales. Typical learning-based surrogate models use a uniform spatial scale, which needs to resolve to the finest required scale and can waste a huge compute to achieve required accuracy. In this work, we introduce Learning controllable Adaptive simulation for Multi-scale Physics (LAMP) as the first full deep learning-based surrogate model that jointly learns the evolution model and optimizes appropriate spatial resolutions that devote more compute to the highly dynamic regions. LAMP consists of a Graph Neural Network (GNN) for learning the forward evolution, and a GNN-based actor-critic for learning the policy of spatial refinement and coarsening. We introduce learning techniques that optimizes LAMP with weighted sum of error and computational cost as objective, which allows LAMP to adapt to varying relative importance of error vs. computation tradeoff at inference time. We test our method in a 1D benchmark of nonlinear PDEs and a challenging 2D mesh-based simulation. We demonstrate that our LAMP outperforms state-of-the-art deep learning surrogate models with up to 39.3% error reduction, and is able to adaptively trade-off computation to improve long-term prediction error. Link » Tailin Wu · Takashi Maruyama · Qingqing Zhao · Gordon Wetzstein · Jure Leskovec 🔗 - Learning Efficient Hybrid Particle-continuum Representations of Non-equilibrium N-body Systems (Poster)  link » An important class of multi-scale, non-equilibrium, N-body physical systems deals with an interplay between particle and continuum phenomena. These include hypersonic flow and plasma dynamics, materials science, and astrophysics. Hybrid solvers that combine particle and continuum representations could provide an efficient framework to model these systems. However, the coupling between these two representations has been a key challenge, which is often limited to inaccurate or incomplete prescriptions. In this work, we introduce a method for Learning Hybrid Particle-Continuum (LHPC) models from the data of first-principles particle simulations. LHPC analyzes the local velocity-space particle distribution function and separates it into near-equilibrium (thermal) and far-from-equilibrium (non-thermal) components. The most computationally-intensive particle solver is used to advance the non-thermal particles, whereas a neural network solver is used to efficiently advance the thermal component using a continuum representation. Most importantly, an additional neural network learns the particle-continuum coupling: the dynamical exchange of mass, momentum, and energy between the particle and continuum representations. Training of the different neural network components is done in an integrated manner to ensure global consistency and stability of the LHPC model. We demonstrate our method in an intense laser-plasma interaction problem involving highly nonlinear, far-from-equilibrium dynamics associated with the coupling between electromagnetic fields and multiple particle species. More efficient modeling of these interactions is critical for the design and optimization of compact accelerators for material science and medical applications. Our method achieves an important balance between accuracy and speed: LHPC is 8 times faster than a classical particle solver and achieves up to 6.8-fold reduction of long-term prediction error for key quantities of interest compared to deep-learning baselines using uniform representations. Link » Tailin Wu · Michael Sun · Hsuan-Gu Chou · Pranay Reddy Samala · Sithipont Cholsaipant · Sophia Kivelson · Jacqueline Yau · Rex Ying · E. Paulo Alves · Jure Leskovec · Frederico Fiuza 🔗 - Toward Neural Network Simulation of Variational Quantum Algorithms (Poster)  link » Variational quantum algorithms (VQAs) utilize a hybrid quantum--classical architecture to recast problems of high-dimensional linear algebra as ones of stochastic optimization. Despite the promise of leveraging near- to intermediate-term quantum resources to accelerate this task, the computational advantage of VQAs over wholly classical algorithms has not been firmly established. For instance, while the variational quantum eigensolver (VQE) has been developed to approximate low-lying eigenmodes of high-dimensional sparse linear operators, analogous classical optimization algorithms exist in the variational Monte Carlo (VMC) literature, utilizing neural networks in place of quantum circuits to represent quantum states. In this paper we ask if classical stochastic optimization algorithms can be constructed paralleling other VQAs, focusing on the example of the variational quantum linear solver (VQLS). We find that such a construction can be applied to the VQLS, yielding a paradigm that could theoretically extend to other VQAs of similar form. Link » Oliver Knitter · James Stokes · Shravan Veerapaneni 🔗 - Privileged Deep Symbolic Regression (Poster)  link » Symbolic regression is the process of finding an analytical expression that fits experimental data with the least amount of operators, variables and constants symbols. Given the huge combinatorial space of possible expressions, evolutionary algorithms struggle to find expressions that meets these criteria in a reasonable amount of time. To efficiently reduce the search space, neural symbolic regression algorithms have recently been proposed for their ability to identify patterns in the data and output analytical expressions in a single forward-pass. However, these new approaches to symbolic regression do not allow for the direct encoding of user-defined prior knowledge, a common scenario in natural sciences and engineering. In this work, we propose the first neural symbolic regression method that allows users to explicitly bias prediction towards expressions that satisfy a set of assumptions on the expected structure of the ground-truth expression. Our experiments show that our conditioned deep learning model outperforms its unconditioned counterparts in terms of accuracy while achieving control over the predicted expression structure. Link » Luca Biggio · Tommaso Bendinelli · Pierre-alexandre Kamienny 🔗 - Simulation-Based Parallel Training (Poster)  link » Numerical simulations are ubiquitous in science and engineering. Machine learning for science investigates how artificial neural architectures can learn from these simulations to speed up scientific discovery and engineering processes. Most of these architectures are trained in a supervised manner. They require tremendous amounts of data from simulations that are slow to generate and memory greedy. In this article, we present our ongoing work to design a training framework that alleviates those bottlenecks. It generates data in parallel with the training process. Such simultaneity induces a bias in the data available during the training. We present a strategy to mitigate this bias with a memory buffer. We test our framework on the multi-parametric Lorenz's attractor. We show the benefit of our framework compared to offline training and the success of our data bias mitigation strategy to capture the complex chaotic dynamics of the system. Link » Lucas Meyer · Alejandro Ribes · Bruno Raffin 🔗 - Fourier Neural Operator for Plasma Modelling (Poster)  link » Predicting plasma evolution within a Tokamak is crucial to building a sustainable fusion reactor. Whether in the simulation space or within the experimental domain, the capability to forecast the spatio-temporal evolution of plasma field variables rapidly and accurately could improve active control methods on current tokamak devices and future fusion reactors. In this work, we demonstrate the utility of using Fourier Neural Operator (FNO) to model the plasma evolution in simulations and experiments. Our work shows that the FNO is capable of predicting magnetohydrodynamic models governing the plasma dynamics, 6 orders of magnitude faster than the traditional numerical solver, while maintaining considerable accuracy (NMSE $\sim 10^{-5})$. Our work also benchmarks the performance of the FNO against other standard surrogate models such as Conv-LSTM and U-Net and demonstrate that the FNO takes significantly less time to train, requires less parameters and outperforms other models. We extend the FNO approach to model the plasma evolution observed by the cameras positioned within the MAST spherical tokamak. We illustrate its capability in forecasting the formation of filaments within the plasma as well as the heat deposits. The FNO deployed to model the camera is capable of forecasting the full length of the plasma shot within half the time of the shot duration. Link » Vignesh Gopakumar · Stanislas Pamela · Lorenzo Zanisi · Zongyi Li · Anima Anandkumar 🔗 - Standardization of chemical compounds using language modeling (Poster)  link » With the growing amount of chemical data stored digitally, it has become crucial to represent chemical compounds consistently. Harmonized representations facilitate the extraction of insightful information from datasets, and are advantageous for machine learning applications. Compound standardization is typically accomplished using rule-based algorithms that modify undesirable descriptions of functional groups, resulting in a consistent representation throughout the dataset. Here, we present the first deep-learning model for molecular standardization. We enable custom standardization schemes based solely on data, as well as standardization options that are difficult to encode in rules. Our model achieves $>98\%$ accuracy in learning two popular rule-based protocols. When fine-tuned on a relatively small dataset of catalysts (for which there is currently no automated standardization practice), the model predicts the expected standardized molecular format with a test accuracy of $62\%$ on average. We show that our model learns not only the grammar and syntax of molecular representations, but also the details of atom ordering, types of bonds, and representations of charged species. In addition, we demonstrate the model's ability to reproduce a canonicalization algorithm with a $95.6\%$ success rate. Link » Miruna Cretu · Alessandra Toniato · Alain C. Vaucher · Amol Thakkar · Amin Debabeche · Teodoro Laino 🔗 - Learning Spatially-Aware Representations of Transcriptomic Data via Transfer Learning (Poster)  link » Computationally integrating spatial transcriptomics (ST) and single-cell transcriptomics (SC) greatly benefits biomedical research such as cellular organization, embryogenesis and tumorigenesis, and could further facilitate therapeutic developments. We proposed a transfer learning model, STEM, to learn spatially-aware embeddings from gene expression for both ST and SC data. The embeddings satisfy both the preservation of spatial information and the elimination of the domain gap between SC and ST data. We used these embeddings to infer the SC-ST mapping and the pseudo SC spatial adjacency, and adopted the attribution function to indicate which genes dominate the spatial information. We designed a comprehensive evaluation pipeline and conducted two simulation experiments, and STEM achieved the best performance compared with previous methods. We applied STEM to human squamous cell carcinoma data and successfully uncovered the spatial localization of rare cell types. STEM is a powerful tool for building single-cell level spatial landscapes and could provide mechanistic insights of heterogeneity and microenvironments in tissues. Link » Minsheng Hao · Lei Wei · Xuegong Zhang 🔗 - Predicting electrolyte solution properties by combining neural network accelerated molecular dynamics and continuum solvent theory. (Poster)  link » Electrolyte solutions play a fundamental role in a vast range of important industrial and biological applications. Yet their thermodynamic and kinetic properties still cannnot be predicted from first principles. There are three central challenges that need to be overcome to achieve this. Firstly, the dynamic nature of these solutions requires long time scale simulations, secondly the long range Coulomb interactions require large spatial scales, thirdly the short range quantum mechanical interactions require an expensive level of theory. Here, we demonstrate a methodology to address these challenges. Short ab initio molecular dynamics (AIMD) simulations corrected with MP2 level calculations of aqueous sodium chloride are used to train an equivariant graph neural network interatomic potential (NNP) that can reproduce the short range forces and energies at moderate computational cost while maintaining a high level of accuracy. This is combined with a continuum solvent description of the long range electrostatic interactions to enable stable long time and large spatial scale simulations. From these simulations ion-water and ion-ion radial distribution functions (RDFs) as well as ionic diffusivities can be determined. The ion-ion RDFs are then used with a new implementation of a new continuum solvent model to compute the osmotic and activity coefficients. Good experimental agreement is demonstrated up to the solubility limit. This approach should be applicable to determine the thermodynamic and kinetic properties of many important electrolyte solutions where there is insufficient experimental data. Link » Timothy T Duignan · Junji Zhang · Joshua Pagotto 🔗 - An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization (Poster)  link » Molecule optimization is an important problem in chemical discovery and has been approached using many techniques, including generative modeling, reinforcement learning, genetic algorithms, and much more. Recent work has also applied zeroth-order (ZO) optimization, a subset of gradient-free optimization that solves problems similarly to gradient-based methods, for optimizing latent vector representations from an autoencoder. In this paper, we study the effectiveness of various ZO optimization methods for optimizing molecular objectives, which are characterized by variable smoothness, infrequent optima, and other challenges. We provide insights on the robustness of various ZO optimizers in this setting, show the advantages of ZO sign-based gradient descent (ZO-signGD), discuss how ZO optimization can be used practically in realistic discovery tasks, and demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite. Link » Elvin Lo · Pin-Yu Chen 🔗 - Data-Driven Computational Imaging for Scientific Discovery (Poster)  link » In computational imaging, hardware for signal sampling and software for object reconstruction are designed in tandem for improved capability. Examples of such systems include computed tomography (CT), magnetic resonance imaging (MRI), and superresolution microscopy. In contrast to more traditional cameras, in these devices, indirect measurements are taken and computational algorithms are used for reconstruction. This allows for advanced capabilities such as super-resolution or 3-dimensional imaging, pushing forward the frontier of scientific discovery. However, these techniques generally require a large number of measurements, causing low throughput, motion artifacts, and/or radiation damage, limiting applications. Data-driven approaches to reducing the number of measurements needed have been proposed, but they predominately require a ground truth or reference dataset, which may be impossible to collect. This work outlines a self-supervised approach and explores the future work that is necessary to make such a technique usable for real applications. Light-emitting diode (LED) array microscopy, a modality that allows visualization of transparent objects in two and three dimensions with high resolution and high field-of-view, is used as an illustrative example. We release our code at https://anonymous.4open.science/r/LED_PVAE-1290 and our experimental data at https://figshare.com/s/635499acfdcdf0893750. Link » Andrew Olsen · Yolanda Hu · Vidya Ganapati 🔗 - De novo PROTAC design using graph-based deep generative models (Poster)  link » PROteolysis TArgeting Chimeras (PROTACs) are an emerging therapeutic modality that degrade a protein of interest (POI) by marking it for degradation by the proteasome. They often take on a three-component barbell-like structure consisting of two binding domains and a linker. While a promising modality, it can be challenging to predict whether a new PROTAC will lead to protein degradation as that is dependent on the cooperation of all subunits to form a successful ternary structure. As such, PROTACs remain a laborious and unpredictable modality to design because the functionalities of each component are highly interdependent. Recent developments in artificial intelligence (AI) suggest that deep generative models can assist with the de novo design of molecules displaying desired properties, yet their application to PROTAC design remains largely unexplored. Additionally, while previous AI-based approaches have optimized the linker component given two active domains, generative models have not yet been applied to optimization of the other two – the warhead and E3 ligand. Here, we show that a graph-based deep generative model (DGM) can be used to propose novel PROTAC structures. The DGM follows the approach of GraphINVENT, a gated-graph neural network which iteratively samples an action space and formulates a sequence of steps to build a new molecular graph. We also demonstrate that this model can be guided towards the generation of PROTACs that are predicted to effectively degrade a POI through policy gradient reinforcement learning (RL). Rewards during RL are applied based on a boosted tree surrogate model that predicts a PROTAC's degradation potential for a specific POI, showing that a nonlinear scoring function can fine-tune a deep molecular generative model towards desired properties. Using this approach, we achieve a model where activity against IRAK3 (a pseudokinase implicated in oncologic signaling) is predicted for >80% of sampled PROTACs after RL, compared to 50% predicted activity before any fine-tuning. Link » Divya Nori · Connor Coley · Rocío Mercado 🔗 - Conditional Invariances for Conformer Invariant Protein Representations (Poster)  link » Representation learning for proteins is an emerging area in geometric deep learning. Recent works have factored in both the relational (atomic bonds) and the geometric aspects (atomic positions) of the task, notably bringing together graph neural networks (GNNs) with neural networks for point clouds. The equivariances and invariances to geometric transformations (group actions such as rotations and translations) so far treats large molecules as rigid structures. However, in many important settings, proteins can co-exist as an ensemble of multiple stable conformations. The conformations of a protein, however, cannot be described as input-independent transformations of the protein: Two proteins may require different sets of transformations in order to describe their set of viable conformations. To address this limitation, we introduce the concept of conditional transformations (CT). CT can capture protein structure, while respecting the restrictions posed by constraints on dihedral (torsion) angles and steric repulsions between atoms. We then introduce a Markov chain Monte Carlo framework to learn representations that are invariant to these conditional transformations. Our results show that endowing existing baseline models with these conditional transformations helps improve their performance without sacrificing computational cost. Link » Balasubramaniam Srinivasan · Vassilis Ioannidis · Soji Adeshina · Mayank Kakodkar · George Karypis · Bruno Ribeiro 🔗 - RLCG: When Reinforcement Learning Meets Coarse Graining (Poster)  link » Coarse graining (CG) algorithms have been widely used to speed up molecular dynamics (MD) simulations. Recent data-driven CG algorithms have demonstrated competitive performances to empirical CG methods. However, these data-driven algorithms often rely heavily on labeled information (e.g., force), which is sometimes unavailable, and may not scale to large and complex molecular systems. In this paper, we propose Reinforcement Learning for Coarse Graining (RLCG), a reinforcement-learning-based framework for learning CG mappings. Particularly, RLCG makes CG assignments based on local information of each atom and is trained using a novel reward function. This "atom-centric" approach may substantially improve the computational scalability. We showcase the power of RLCG by demonstrating its competitive performance against the state-of-the-arts on small (Alanine Dipeptide and Paracetamol) and medium-sized (Chignolin) molecules. More broadly, RLCG has great potential in accelerating the scientific discovery cycle, especially on large-scale problems. Link » Shenghao Wu · Tianyi Liu · Zhirui Wang · Wen Yan · Yingxiang Yang 🔗 - Neurosymbolic Programming for Science (Poster)  link » Neurosymbolic Programming (NP) techniques have the potential to accelerate scientific discovery across fields. These models combine neural and symbolic components to learn complex patterns and representations from data, using high-level concepts or known constraints. As a result, NP techniques can interface with symbolic domain knowledge from scientists, such as prior knowledge and experimental context, to produce interpretable outputs. Here, we identify opportunities and challenges between current NP models and scientific workflows, with real-world examples from behavior analysis in science. We define concrete next steps to move the NP for science field forward, to enable its use broadly for workflows across the natural and social sciences. Link » Jennifer J Sun · Megan Tjandrasuwita · Atharva Sehgal · Armando Solar-Lezama · Swarat Chaudhuri · Yisong Yue · Omar Costilla Reyes 🔗 - DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking (Poster)  link » Predicting the binding structure of a small molecule ligand to a protein---a task known as molecular docking---is critical to drug design. Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space. Empirically, DiffDock obtains a 38% top-1 success rate (RMSD<2Å) on PDBBind, significantly outperforming the previous state-of-the-art of traditional docking (23%) and deep learning (20%) methods. Moreover, DiffDock has fast inference times and provides confidence estimates with high selective accuracy. Link » Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola 🔗 - D-CIPHER: Discovery of Closed-form Partial Differential Equations (Poster)  link » Closed-form differential equations, including partial differential equations and higher-order ordinary differential equations, are one of the most important tools used by scientists to model and better understand natural phenomena. Discovering these equations directly from data is challenging because it requires modeling relationships between various derivatives that are not observed in the data (equation-data mismatch) and it involves searching across a huge space of possible equations. Current approaches make strong assumptions about the form of the equation and thus fail to discover many well-known systems. Moreover, many of them resolve the equation-data mismatch by estimating the derivatives, which makes them inadequate for noisy and infrequently sampled systems. To this end, we propose D-CIPHER, which is robust to measurement artifacts and can uncover a new and very general class of differential equations. We further design a novel optimization procedure, CoLLie, to help D-CIPHER search through this class efficiently. Finally, we demonstrate empirically that it can discover many well-known equations that are beyond the capabilities of current methods. Link » Krzysztof Kacprzyk · Zhaozhi Qian · Mihaela van der Schaar 🔗 - Xtal2DoS: Attention-based Crystal to Sequence Learning for Density of States Prediction (Poster)  link » Modern machine learning techniques have been extensively applied to the materials science, especially for property prediction tasks. A majority of these methods address the scalar property predictions, while more challenging spectral properties remain less emphasized. We formulate a crystal-to-sequence learning task and propose a novel attention-based learning method, Xtal2DoS, which decodes the sequential representation of material density of states (DoS) properties by incorporating the learned atomic embeddings through attention networks. Experiments show Xtal2DoS is faster than the existing models, and consistently outperforms other state-of-the-art methods on four metrics for two fundamental spectral properties, phonon and electronic DoS. Link » Junwen Bai · Yuanqi Du · Yingheng Wang · Shufeng Kong · John Gregoire · Carla Gomes 🔗 - Symbolic-Model-Based Reinforcement Learning (Poster)  link » We investigate using symbolic regression (SR) to model dynamics with mathematical expressions in model-based reinforcement learning (MBRL). While the primary promise of MBRL is to enable sample-efficient learning, most popular MBRL algorithms rely, in order to learn their approximate world model, on black-box over-parametrized neural networks, which are known to be data-hungry and are prone to overfitting in low-data regime. In this paper, we leverage the fact that a large collection of environments considered in RL is governed by physical laws that compose elementary operators e.g $\sin{},\sqrt{\phantom{x}}, \exp{}, \frac{\text{d}}{\text{dt}}$, and we propose to search a world model in the space of interpretable mathematical expressions with SR. We show empirically on simple domains that MBRL can benefit from the extrapolation capabilities and sample efficiency of SR compared to neural models. Link » Pierre-alexandre Kamienny · Sylvain Lamprier 🔗 - Towards Learned Simulators for Cell Migration (Poster)  link » Simulators driven by deep learning are gaining popularity as a tool for efficiently emulating accurate but expensive numerical simulators. Successful applications of such neural simulators can be found in the domains of physics, chemistry, and structural biology, amongst others. Likewise, a neural simulator for cellular dynamics can augment lab experiments and traditional computational methods to enhance our understanding of a cell's interaction with its physical environment. In this work, we propose an autoregressive probabilistic model that can reproduce spatiotemporal dynamics of single cell migration, traditionally simulated with the Cellular Potts model. We observe that standard single-step training methods do not only lead to inconsistent rollout stability, but also fail to accurately capture the stochastic aspects of the dynamics, and we propose training strategies to mitigate these issues. Our evaluation on two proof-of-concept experimental scenarios shows that neural methods have the potential to faithfully simulate stochastic cellular dynamics at least an order of magnitude faster than a state-of-the-art implementation of the Cellular Potts model. Link » Koen Minartz · Yoeri Poels · Vlado Menkovski 🔗 - Flow Annealed Importance Sampling Bootstrap (Poster)  link » Normalizing flows are tractable density models that can approximate complicated target distributions, e.g. Boltzmann distributions of physical systems. However, current methods for training flows either suffer from mode-seeking behavior, use samples from the target generated by expensive MCMC simulations, or use stochastic losses that have high variance. To avoid these problems, we augment flows with annealed importance sampling (AIS) and minimize the mass-covering $\alpha$-divergence with $\alpha=2$, which minimizes importance weight variance. Our method, Flow AIS Bootstrap (FAB), uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes. We apply FAB to complex multimodal targets and show that we can approximate them accurately where previous methods fail. To the best of our knowledge, we are the first to learn the Boltzmann distribution of the alanine dipeptide molecule using only the unnormalized target density, without access to samples generated via Molecular Dynamics (MD) simulations: FAB produces better results than training via maximum likelihood on MD samples while using 100 times fewer target evaluations. After reweighting samples, we obtain unbiased histograms of dihedral angles that are almost identical to the ground truth. Link » Laurence Midgley · Vincent Stimper · Gregor Simm · Bernhard Schölkopf · José Miguel Hernández-Lobato 🔗 - Supervised Pretraining for Molecular Force Fields and Properties Prediction (Poster)  link » Machine learning approaches have become popular for molecular modeling tasks, including molecular force fields and properties prediction. Traditional supervised learning methods suffer from scarcity of labeled data for particular tasks, motivating the use of large-scale dataset for other relevant tasks. We propose to pretrain neural networks on a dataset of 86 millions of molecules with atom charges and 3D geometries as inputs and molecular energies as labels. Experiments show that, compared to training from scratch, fine-tuning the pretrained model can significantly improve the performance for seven molecular property prediction tasks and two force field tasks. We also demonstrate that the learned representations from the pretrained model contain adequate information about molecular structures, by showing that linear probing of the representations can predict many molecular information including atom types, interatomic distances, class of molecular scaffolds, and existence of molecular fragments. Our results show that supervised pretraining is a promising research direction in molecular modeling. Link » Xiang Gao · Weihao Gao · Wenzhi Xiao · Zhirui Wang · Chong Wang · Liang Xiang 🔗 - Learning Regularized Positional Encoding for Molecular Prediction (Poster)  link » Machine learning has become a promising approach for molecular modeling. Positional quantities, such as interatomic distances and bond angles, play a crucial role in molecule physics. The existing works rely on careful manual design of their representation. To model the complex nonlinearity in predicting molecular properties in an more end-to-end approach, we propose to encode the positional quantities with a learnable embedding that is continuous and differentiable. A regularization technique is employed to encourage embedding smoothness along the physical dimension. We experiment with a variety of molecular property and force field prediction tasks. Improved performance is observed for three different model architectures after plugging in the proposed positional encoding method. In addition, the learned positional encoding allows easier physics-based interpretation. We observe that tasks of similar physics have the similar learned positional encoding. Link » Xiang Gao · Weihao Gao · Wenzhi Xiao · Zhirui Wang · Chong Wang · Liang Xiang 🔗 - Continuous PDE Dynamics Forecasting with Implicit Neural Representations (Poster)  link » Effective data-driven PDE forecasting methods often rely on fixed spatial and / or temporal discretizations. This raises limitations in real-world applications like weather prediction where flexible extrapolation at arbitrary spatiotemporal locations is required. We address this problem by introducing a new data-driven approach, DINo, that models a PDE's flow with continuous-time dynamics of spatially continuous functions. This is achieved by embedding spatial observations independently of their discretization via Implicit Neural Representations in a small latent space temporally driven by a learned ODE. This separate and flexible treatment of time and space makes DINo the first data-driven model to combine the following advantages. It extrapolates at arbitrary spatial and temporal locations; it can learn from sparse irregular grids or manifolds; at test time, it generalizes to new grids or resolutions. DINo outperforms alternative neural PDE forecasters in a variety of challenging generalization scenarios on representative PDE systems. Link » Yuan Yin · Matthieu Kirchmeyer · Jean-Yves Franceschi · Alain Rakotomamonjy · Patrick Gallinari 🔗 - Meta-learning Adaptive Deep Kernel Gaussian Processes for Molecular Property Prediction (Poster)  link » We propose Adaptive Deep Kernel Fitting with Implicit Function Theorem (ADKF-IFT), a novel framework for learning deep kernels by interpolating between meta-learning and conventional deep kernel learning. Our approach employs a bilevel optimization objective where we meta-learn generally useful feature representations across tasks, in the sense that task-specific Gaussian process models estimated on top of such features achieve the lowest possible predictive loss on average. We solve the resulting nested optimization problem using the implicit function theorem (IFT). We show that our ADKF-IFT framework contains Deep Kernel Learning (DKL) and Deep Kernel Transfer (DKT) as special cases. Although ADKF-IFT is a completely general method, we argue that it is especially well-suited for drug discovery problems and demonstrate that it significantly outperforms previous SOTA methods on a variety of real-world few-shot molecular property prediction tasks and out-of-domain molecular property prediction and optimization tasks. Link » Wenlin Chen · Austin Tripp · José Miguel Hernández-Lobato 🔗 - Physics-Guided Discovery of Highly Nonlinear Parametric Partial Differential Equations (Poster)  link » Partial differential equations (PDEs) fitting scientific data can represent physical laws with explainable mechanisms for various mathematically-oriented subjects. The data-driven discovery of PDEs from scientific data thrives as a new attempt to model complex phenomena in nature, but the effectiveness of current practice is typically limited by the scarcity of data and the complexity of phenomena. Especially, the discovery of PDEs with highly nonlinear coefficients from low-quality data remains largely under-addressed. To deal with this challenge, we propose a novel physics-guided learning method, which can not only encode observation knowledge such as initial and boundary conditions but also incorporate the basic physical principles and laws to guide the model optimization. We empirically demonstrate that the proposed method is more robust against data noise and sparsity, and can reduce the estimation error by a large margin; moreover, for the first time we are able to discover PDEs with highly nonlinear coefficients. Link » Yingtao Luo · Qiang Liu · Yuntian Chen · Wenbo Hu · TIAN TIAN · Jun Zhu 🔗 - Chemistry Guided Molecular Graph Transformer (Poster)  link » Classic methods to calculate molecular properties are insufficient for large amounts of data. The Transformer architecture has achieved competitive performance on graph-level prediction by introducing general graphic embedding. However, the direct spatial encoding strategy ignores important inductive bias for molecular graphs, such as aromaticity and interatomic forces. In this paper, inspired by the intrinsic properties of chemical molecules, we propose a chemistry-guided molecular graph Transformer. Specifically, motif-based spatial embedding and distance-guided multi-scale self-attention for graph Transformer are proposed to predict molecular property effectively. To evaluate the proposed methods, we have conducted experiments on two large molecular property prediction datasets, ZINC, and PCQM4M-LSC. The results show that our methods achieve superior performance compared to various state-of-the-art methods.Code is available at https://github.com/PSacfc/chemistry-graph-transformer . Link » Peisong Niu · Tian Zhou · Qingsong Wen · Liang Sun · Tao Yao 🔗 - Retrosynthesis Prediction Revisited (Poster)  link » Retrosynthesis is an important problem in chemistry and represents an interesting challenge for AI since it involves predictions over sets of complex, molecular graph structures. Recently, a wealth of models ranging from language models to graph neural networks are being proposed. However, most studies evaluate over a single dataset and split only, focus on top-1 accuracy, and provide few insight into the actual capabilities of individual models. This prevents research from moving forward since issues to be addressed by future work are not identified. In this paper, we focus on the evaluation: we show that the currently used data does not fit to test generalization, one of the main goals stated in the literature; propose new splits of the USPTO reactions modeling various scenarios; study representatives of the main types of models over this data; and finally present the, to the best of our knowledge, first evaluation and comparison of these models in the multi-step scenario. Altogether, we show that the picture is more diverse than the results over the usually used USPTO-50k data suggest. Link » Hongyu Tu · Shantam Shorewala · Tengfei Ma · Veronika Thost 🔗 - Optimizing Intermediate Representations of Generative Models for Phase Retrieval (Poster)  link » Fourier phase retrieval is the problem of reconstructing images from magnitude-only measurements. It is relevant in many areas of science, e.g., in X-ray crystallography, astronomy, microscopy, array imaging and optics. When training data is available, generative models can be used to constrain the solution set. However, not all possible solutions are within the range of the generator. Instead, they are represented with some error. To reduce this representation error in the context of phase retrieval, we first leverage a novel variation of intermediate layer optimization (ILO) to extend the range of the generator while still producing images consistent with the training data. Second, we introduce new initialization schemes that further improve the quality of the reconstruction. With extensive experiments, we can show the benefits of our modified ILO and the new initialization schemes. Link » Tobias Uelwer · Sebastian Konietzny · Stefan Harmeling 🔗 - Bayesian parameter inference of a vortically perturbed flame model for the prediction of thermoacoustic instability (Poster)  link » Thermoacoustic instabilities can be highly detrimental to the operation of aircraft gas turbine combustors within design conditions, and hence their prediction and suppression are crucial. This work uses a Bayesian machine learning method to infer the parameters of a bluff-body stabilised, physics-informed flame model in real-time. The flame front is modelled using the $G$-equation, a level-set method which segments the flow into regions of reactants and products. The flow past the bluff-body is modelled with a discrete vortex method (DVM) to account for vortical perturbations on the flame front. Using the physics-informed model with the learned parameters from both the $G$-equation and the DVM, a flame transfer function (FTF) is obtained, from which the growth rates of instability in the system can be calculated. A heteroscedastic Bayesian neural network ensemble (BayNNE) is trained on a library of flame front simulations with known target parameters in both models. The trained BayNNE is a surrogate model for a Bayesian posterior of the target parameters given the input flame front coordinates. The ensemble predicts some parameters of the DVM with more certainty than others, showing which are more influential in affecting the flame front. Using the learned posterior, the flame fronts are re-simulated, to extrapolate the flame beyond the experimental window where it was observed. Flame results are also extrapolated in parameter space. These extrapolated flame shapes are then used to calculate thermoacoustic frequencies and growth rates of the system. We observe that the growth rates and frequencies do not show a strong dependency on the amplitude of forcing, which is one of the inferred parameters of the physics-informed model. This important result suggests that a FTF derived at high amplitude, when it is observable, is also valid at low amplitude, when it is not observable. Link » Max Croci · Joel Vasanth · Ushnish Sengupta · Ekrem Ekici · Matthew Juniper 🔗

#### Author Information

##### Yoshua Bengio (Mila / U. Montreal)

Yoshua Bengio is Full Professor in the computer science and operations research department at U. Montreal, scientific director and founder of Mila and of IVADO, Turing Award 2018 recipient, Canada Research Chair in Statistical Learning Algorithms, as well as a Canada AI CIFAR Chair. He pioneered deep learning and has been getting the most citations per day in 2018 among all computer scientists, worldwide. He is an officer of the Order of Canada, member of the Royal Society of Canada, was awarded the Killam Prize, the Marie-Victorin Prize and the Radio-Canada Scientist of the year in 2017, and he is a member of the NeurIPS advisory board and co-founder of the ICLR conference, as well as program director of the CIFAR program on Learning in Machines and Brains. His goal is to contribute to uncover the principles giving rise to intelligence through learning, as well as favour the development of AI for the benefit of all.