`

Timezone: »

 
Physion: Evaluating Physical Prediction from Vision in Humans and Machines
Daniel Bear · Elias Wang · Damian Mrowca · Felix Binder · Hsiao-Yu Tung · Pramod RT · Cameron Holdaway · Sirui Tao · Kevin Smith · Fan-Yun Sun · Fei-Fei Li · Nancy Kanwisher · Josh Tenenbaum · Dan Yamins · Judith Fan

While machine learning algorithms excel at many challenging visual tasks, it is unclear that they can make predictions about commonplace real world physical events. Here, we present a visual and physical prediction benchmark that precisely measures this capability. In realistically simulating a wide variety of physical phenomena – rigid and soft-body collisions, stable multi-object configurations, rolling and sliding, projectile motion – our dataset presents a more comprehensive challenge than existing benchmarks. Moreover, we have collected human responses for our stimuli so that model predictions can be directly compared to human judgements. We compare an array of algorithms – varying in their architecture, learning objective, input-output structure, and training data – on their ability to make diverse physical predictions. We find that graph neural networks with access to the physical state best capture human behavior, whereas among models that receive only visual input, those with object-centric representations or pretraining do best but fall far short of human accuracy. This suggest that extracting physically meaningful representations of scenes is the main bottleneck to achieving human-like visual prediction. We thus demonstrate how our benchmark can identify areas for improvement and measure progress on this key aspect of physical understanding.

Author Information

Daniel Bear (Stanford University)
Elias Wang (Stanford University)
Damian Mrowca (Stanford University)

Young children are excellent at playing, an ability to explore and (re)structure their environment that allows them to develop a remarkable visual and physical representation of their world that sets them apart from even the most advanced robots. Damian Mrowca is studying (1) representations and architectures that allow machines to efficiently develop an intuitive physical understanding of their world and (2) mechanisms that allow agents to learn such representations in a self-supervised way. Damian is a 3rd year PhD student co-advised by Prof. Fei-Fei Li and Prof. Daniel Yamins. He received his BSc (2012) and MSc (2015) in Electrical Engineering and Information Theory, both from the Technical University of Munich. During 2014-2015 he was a visiting student with Prof. Trevor Darrell at UC Berkeley. After a year in start-up land, looking to apply his research in businesses, he joined the Stanford Vision Lab and NeuroAILab in September 2016.

Felix Binder (UCSD)

I’m a third year PhD student at the cognitive science department at UC San Diego. I’m working on visual and physical reasoning and mental simulation. Currently, I am working on agent-based reinforcement learning models for physical construction tasks (ie. building structures using bricks), with a focus on how we might plan more efficiently making use of the environment. My approach is best described as computational cognitive science: trying to discover the high-level algorithms of cognition.

Hsiao-Yu Tung (Carnegie Mellon University)
Pramod RT (MIT)
Cameron Holdaway
Sirui Tao (University of California, San Diego)
Kevin Smith (MIT)
Fan-Yun Sun (National Taiwan University)
Fei-Fei Li (Princeton University)
Nancy Kanwisher (MIT)
Josh Tenenbaum (MIT)

Josh Tenenbaum is an Associate Professor of Computational Cognitive Science at MIT in the Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 1999, and was an Assistant Professor at Stanford University from 1999 to 2002. He studies learning and inference in humans and machines, with the twin goals of understanding human intelligence in computational terms and bringing computers closer to human capacities. He focuses on problems of inductive generalization from limited data -- learning concepts and word meanings, inferring causal relations or goals -- and learning abstract knowledge that supports these inductive leaps in the form of probabilistic generative models or 'intuitive theories'. He has also developed several novel machine learning methods inspired by human learning and perception, most notably Isomap, an approach to unsupervised learning of nonlinear manifolds in high-dimensional data. He has been Associate Editor for the journal Cognitive Science, has been active on program committees for the CogSci and NIPS conferences, and has co-organized a number of workshops, tutorials and summer schools in human and machine learning. Several of his papers have received outstanding paper awards or best student paper awards at the IEEE Computer Vision and Pattern Recognition (CVPR), NIPS, and Cognitive Science conferences. He is the recipient of the New Investigator Award from the Society for Mathematical Psychology (2005), the Early Investigator Award from the Society of Experimental Psychologists (2007), and the Distinguished Scientific Award for Early Career Contribution to Psychology (in the area of cognition and human learning) from the American Psychological Association (2008).

Dan Yamins
Judith Fan (University of California, San Diego)

More from the Same Authors