Much progress has been made on end-to-end learning for physical understanding and reasoning. If successful, understanding and reasoning about the physical world promises far-reaching applications in robotics, machine vision, and the physical sciences. Despite this recent progress, our best artificial systems pale in comparison to the flexibility and generalization of human physical reasoning.
Neural information processing systems have shown promising empirical results on synthetic datasets, yet do not transfer well when deployed in novel scenarios (including the physical world). If physical understanding and reasoning techniques are to play a broader role in the physical world, they must be able to function across a wide variety of scenarios, including ones that might lie outside the training distribution. How can we design systems that satisfy these criteria?
Our workshop aims to investigate this broad question by bringing together experts from machine learning, the physical sciences, cognitive and developmental psychology, and robotics to investigate how these techniques may one day be employed in the real world. In particular, we aim to investigate the following questions: 1. What forms of inductive biases best enable the development of physical understanding techniques that are applicable to real-world problems? 2. How do we ensure that the outputs of a physical reasoning module are reasonable and physically plausible? 3. Is interpretability a necessity for physical understanding and reasoning techniques to be suitable to real-world problems?
Unlike end-to-end neural architectures that distribute bias across a large set of parameters, modern structured physical reasoning modules (differentiable physics, relational learning, probabilistic programming) maintain modularity and physical interpretability. We will discuss how these inductive biases might aid in generalization and interpretability, and how these techniques impact real-world problems.
Tue 8:00 a.m. - 8:15 a.m.
|
Introductory remarks
(
Live talk
)
SlidesLive Video » |
🔗 |
Tue 8:15 a.m. - 8:45 a.m.
|
Tomer Ullman
(
Live talk
)
SlidesLive Video » |
🔗 |
Tue 8:45 a.m. - 9:15 a.m.
|
Nils Thuerey
(
Live talk
)
SlidesLive Video » |
Nils Thuerey 🔗 |
Tue 9:15 a.m. - 9:45 a.m.
|
Karen Liu
(
Live talk
)
SlidesLive Video » |
Karen Liu 🔗 |
Tue 10:30 a.m. - 10:40 a.m.
|
Playful Interactions for Representation Learning
(
Oral
)
SlidesLive Video » One of the key challenges in visual imitation learning is collecting large amounts of expert demonstrations for a given task. While methods for collecting human demonstrations are becoming easier with teleoperation methods and the use of low-cost assistive tools, we often still require 100-1000 demonstrations for every task to learn a visual representation and policy. To address this, we turn to an alternate form of data that does not require task-specific demonstrations -- play. Playing is a fundamental method children use to learn a set of skills and behaviors and visual representations in early learning. Importantly, play data is diverse, task-agnostic, and relatively cheap to obtain. In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks. We collect 2 hours of playful data in 19 diverse environments and use self-predictive learning to extract visual representations. Given these representations, we train policies using imitation learning for two downstream tasks: Pushing and Stacking. Our representations, which are trained from scratch, compare favorably against ImageNet pretrained representations. Finally, we provide an experimental analysis on the effects of different pretraining modes on downstream task learning. |
Sarah Young · Pieter Abbeel · Lerrel Pinto 🔗 |
Tue 10:40 a.m. - 10:50 a.m.
|
Efficient and Interpretable Robot Manipulation with Graph Neural Networks
(
Oral
)
SlidesLive Video » Manipulation tasks like loading a dishwasher can be seen as a sequence of spatial constraints and relationships between different objects. We aim to discover these rules from demonstrations by posing manipulation as a classification problem over a graph, whose nodes represent task-relevant entities like objects and goals. In our experiments, a single GNN policy trained using imitation learning (IL) on 20 expert demonstrations can solve blockstacking and rearrangement tasks in both simulation and on hardware, generalizing over the number of objects and goal configurations. These experiments show that graphical IL can solve complex long-horizon manipulation problems without requiring detailed task descriptions. |
Yixin Lin · Austin Wang · Eric Undersander · Akshara Rai 🔗 |
Tue 10:50 a.m. - 11:00 a.m.
|
Vision-based system identification and 3D keypoint discovery using dynamics constraints
(
Oral
)
SlidesLive Video » This paper introduces V-SysId, a novel method that enables simultaneous keypoint discovery, 3D system identification, and extrinsic camera calibration from an unlabeled video taken from a static camera, using only the family of equations of motion of the object of interest as weak supervision. V-SysId takes keypoint trajectory proposals and alternates between maximum likelihood parameter estimation and extrinsic camera calibration, before applying a suitable selection criterion to identify the track of interest. This is then used to train a keypoint tracking model using supervised learning. Results on a range of settings (robotics, physics, physiology) highlight the utility of this approach. |
Miguel Jaques · Martin Asenov · Michael Burke · Timothy Hospedales 🔗 |
Tue 11:00 a.m. - 11:02 a.m.
|
3D Neural Scene Representations for Visuomotor Control
(
Spotlight
)
SlidesLive Video » Humans have a strong intuitive understanding of the 3D environment around us. The mental model of the physics in our brain applies to objects of different materials and enables us to perform a wide range of manipulation tasks that are far beyond the reach of current robots. In this work, we desire to learn models for dynamic 3D scenes purely from 2D visual observations. Our model combines Neural Radiance Fields (NeRF) and time contrastive learning with an autoencoding framework, which learns viewpoint-invariant 3D-aware scene representations. We show that a dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks involving both rigid bodies and fluids, where the target is specified in a viewpoint different from what the robot operates on. When coupled with an auto-decoding framework, it can even support goal specification from camera viewpoints that are outside the training distribution. We further demonstrate the richness of the learned 3D dynamics model by performing future prediction and novel view synthesis. Finally, we provide detailed ablation studies regarding different system designs and qualitative analysis of the learned representations. |
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba 🔗 |
Tue 11:02 a.m. - 11:04 a.m.
|
Learning Graph Search Heuristics
(
Spotlight
)
SlidesLive Video » Searching for a path between two nodes in a graph is one of the most well-studied and fundamental problems in computer science. In numerous domains such as robotics, AI, or biology, practitioners develop search heuristics to accelerate their pathfinding algorithms. However, it is a laborious and complex process to hand-design heuristics based on the problem and the structure of a given use case. Here we present PHIL (Path Heuristic with Imitation Learning), a novel neural architecture and a training algorithm for discovering graph search and navigation heuristics from data by leveraging recent advances in imitation learning and graph representation learning. At training time, we aggregate datasets of search trajectories and ground-truth shortest path distances, which we use to train a specialized graph neural network-based heuristic function using backpropagation through steps of the pathfinding process. Our heuristic function learns graph embeddings useful for inferring node distances, runs in constant time independent of graph sizes, and can be easily incorporated in an algorithm such as A* at test time. Experiments show that PHIL reduces the number of explored nodes compared to state-of-the-art methods on benchmark datasets by 40.8% on average and allows for fast planning in time-critical robotics domains. |
Michal Pándy · Rex Ying · Gabriele Corso · Petar Veličković · Jure Leskovec · Pietro Liò 🔗 |
Tue 11:04 a.m. - 11:06 a.m.
|
Efficient Partial Simulation Quantitatively Explains Deviations from Optimal Physical Predictions
(
Spotlight
)
SlidesLive Video » Humans are adept at planning actions in real-time dynamic physical environments. Machine intelligence struggles with this task, and one cause is that running simulators of complex, real-world environments is computationally expensive. Yet recent accounts suggest that humans use mental simulation in order to make intuitive physical judgments. How is human physical reasoning so accurate, while maintaining computational tractability? We suggest that human behavior is well described by partial simulation, which moves forward in time only parts of the world deemed relevant. We take as a case study Ludwin-Peery, Bramley, Davis, and Gureckis (2020), in which a conjunction fallacy was found in the domain of intuitive physics. This phenomenon is difficult to explain with full simulation, but we show it can be quantitatively accounted for with partial simulation. We discuss how AI research could make use of efficient partial simulation in implementations of commonsense physical reasoning. |
Ilona Bass · Kevin Smith · Elizabeth Bonawitz · Tomer Ullman 🔗 |
Tue 11:06 a.m. - 11:08 a.m.
|
TorchDyn: Implicit Models and Neural Numerical Methods in PyTorch
(
Spotlight
)
SlidesLive Video » Computation in traditional deep learning models is directly determined by the explicit linking of select primitives e.g. layers or blocks arranged in a computational graph. Implicit neural models follow instead a declarative approach; a desiderata is encoded into constraints and a numerical method is applied to solve the resulting optimization problem as part of the inference pass. Existing open-source frameworks focus on explicit models and do not offer implementations of the numerical routines required to study and benchmark implicit models. We introduce TorchDyn, a PyTorch library fully tailored to implicit learning. TorchDyn primitives are categorized into numerical and sensitivity methods and model classes, with pre-existing implementations that can be combined and repurposed to obtain complex compositional implicit architectures. TorchDyn further offers a collection step-by-step tutorials and benchmarks designed to accelerate research and improve the robustness of experimental evaluations for implicit models. |
Michael Poli · Stefano Massaroli · Atsushi Yamashita · Hajime Asama · Jinkyoo Park · Stefano Ermon 🔗 |
Tue 11:08 a.m. - 11:10 a.m.
|
3D-OES: Viewpoint-Invariant Object-FactorizedEnvironment Simulators
(
Spotlight
)
SlidesLive Video » We propose an action-conditioned dynamics model that predicts scene changes caused by object and agent interactions in a viewpoint-invariant 3D neural scene representation space, inferred from RGB-D videos. In this 3D feature space, objects do not interfere with one another and their appearance persists over time and across viewpoints. This permits our model to predict future scenes long in the future by simply “moving" 3D object features based on cumulative object motion predictions. Object motion predictions are computed by a graph neural network that operates over the object features extracted from the 3D neural scene representation. Our model generalizes well across varying number and appearances of interacting objects as well as across camera viewpoints, outperforming existing 2D and 3D dynamics models, and enables successful sim-to-real transfer. |
Hsiao-Yu Tung · Zhou Xian · Mihir Prabhudesai · Katerina Fragkiadaki 🔗 |
Tue 11:10 a.m. - 11:12 a.m.
|
DLO@Scale: A Large-scale Meta Dataset for Learning Non-rigid Object Pushing Dynamics
(
Spotlight
)
SlidesLive Video » The ability to quickly understand our physical environment and make predictions about interacting objects is fundamental to us humans. To equip artificial agents with similar reasoning capabilities, machine learning can be used to approximate the underlying state dynamics of a system. In this regard, deep learning has gained much popularity yet relying on the availability of large-enough datasets. In this work, we present DLO@Scale, a new dataset for studying future state prediction in the context of multi-body deformable linear object pushing. We provide a large collection of 100 million simulated physical interactions enabling thorough statistical analysis and algorithmic benchmarks. Our data captures complex mechanical phenomena such as elasticity, plastic deformation and friction. An important aspect is the large variation of the physical parameters making it also suitable for testing meta learning algorithms. We describe DLO@Scale in detail and present a first empirical evaluation using neural network baselines. |
Robert Gieselmann · Danica Kragic · Florian T. Pokorny · Alberta Longhini 🔗 |
Tue 11:12 a.m. - 11:14 a.m.
|
AVoE: A Synthetic 3D Dataset on Understanding Violation of Expectation for Artificial Cognition
(
Spotlight
)
SlidesLive Video » Recent work in cognitive reasoning and computer vision has engendered an increasing popularity for the Violation-of-Expectation (VoE) paradigm in synthetic datasets. Inspired by work in infant psychology, researchers have started evaluating a model's ability to discriminate between expected and surprising scenes as a sign of its reasoning ability. Existing VoE-based 3D datasets in physical reasoning only provide vision data. However, current cognitive models of physical reasoning by psychologists reveal infants create high-level abstract representations of objects and interactions. Capitalizing on this knowledge, we propose AVoE: a synthetic 3D VoE-based dataset that presents stimuli from multiple novel sub-categories for five event categories of physical reasoning. Compared to existing work, AVoE is armed with ground-truth labels of abstract features and rules augmented to vision data, paving the way for high-level symbolic predictions in physical reasoning tasks. |
Arijit Dasgupta · Jiafei Duan · Marcelo Ang Jr · Cheston Tan 🔗 |
Tue 11:14 a.m. - 11:16 a.m.
|
Physics-guided Learning-based Adaptive Control on the SE(3) Manifold
(
Spotlight
)
SlidesLive Video »
In real-world robotics applications, accurate models of robot dynamics are critical for safe and stable control in rapidly changing operational conditions. This motivates the use of machine learning techniques to approximate robot dynamics and their disturbances over a training set of state-control trajectories. This paper demonstrates that inductive biases arising from physics laws can be used to improve the data efficiency and accuracy of the approximated dynamics model. For example, the dynamics of many robots, including ground, aerial, and underwater vehicles, are described using their $SE(3)$ pose and satisfy conservation of energy principles. We design a physically plausible model of the robot dynamics by imposing the structure of Hamilton's equations of motion in the design of a neural ordinary differential equation (ODE) network. The Hamiltonian structure guarantees satisfaction of $SE(3)$ kinematic constraints and energy conservation by construction. It also allows us to derive an energy-based adaptive controller that achieves trajectory tracking while compensating for disturbances. Our learning-based adaptive controller is verified on an under-actuated quadrotor robot.
|
Thai Duong · Nikolay Atanasov 🔗 |
Tue 11:16 a.m. - 11:18 a.m.
|
Neural NID Rules
(
Spotlight
)
SlidesLive Video » Abstract object properties and their relations are deeply rooted in human common sense, allowing people to predict the dynamics of the world even in situations that are novel but governed by familiar laws of physics. Standard machine learning models in model-based reinforcement learning are inadequate to generalize in this way. Inspired by the classic framework of noisy indeterministic deictic (NID) rules, we introduce here Neural NID, a method that learns abstract object properties and relations between objects with a suitably regularized graph neural network. We validate the greater generalization capability of Neural NID on simple benchmarks specifically designed to assess the transition dynamics learned by the model. |
Luca Viano · Johanni Brea 🔗 |
Tue 11:30 a.m. - 12:00 p.m.
|
Kelsey Allen
(
Live talk
)
SlidesLive Video » |
Kelsey Allen 🔗 |
Tue 12:00 p.m. - 12:30 p.m.
|
Kyle Cranmer
(
Live talk
)
SlidesLive Video » |
Kyle Cranmer 🔗 |
Tue 12:30 p.m. - 1:00 p.m.
|
Shuran Song
(
Live talk
)
SlidesLive Video » |
Shuran Song 🔗 |
Tue 1:00 p.m. - 2:00 p.m.
|
Industry Panel: Kenneth Tran (Koidra), Hiro Ono (NASA JPL), Aleksandra Faust (Google Brain), Michael Roberts (COVID-19 AIX-COVNET University of Cambridge)
(
Discussion Panel
)
SlidesLive Video » |
🔗 |
Tue 2:00 p.m. - 2:45 p.m.
|
Research Panel
(
Discussion Panel
)
SlidesLive Video » |
🔗 |
Tue 2:45 p.m. - 4:00 p.m.
|
Social - GatherTown
(
GatherTown Meeting
)
link »
[ protected link dropped ] |
🔗 |