Learning to Exploit Stability for 3D Scene Parsing
Yilun Du · Zhijian Liu · Hector Basevi · Ales Leonardis · Bill Freeman · Josh Tenenbaum · Jiajun Wu

Wed Dec 5th 10:45 AM -- 12:45 PM @ Room 517 AB #123

Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue which we exploit to enhance scene understanding. We integrate the physical cue of stability into the learning process using a REINFORCE approach coupled to a physics engine, and apply this to the problem of producing the 3D bounding boxes and poses of objects in a scene. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that applying physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data.

Author Information

Yilun Du (MIT)
Zhijian Liu (MIT)
Hector Basevi (University of Birmingham)
Ales Leonardis (University of Birmingham)
Bill Freeman (MIT/Google)
Josh Tenenbaum (MIT)

Josh Tenenbaum is an Associate Professor of Computational Cognitive Science at MIT in the Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 1999, and was an Assistant Professor at Stanford University from 1999 to 2002. He studies learning and inference in humans and machines, with the twin goals of understanding human intelligence in computational terms and bringing computers closer to human capacities. He focuses on problems of inductive generalization from limited data -- learning concepts and word meanings, inferring causal relations or goals -- and learning abstract knowledge that supports these inductive leaps in the form of probabilistic generative models or 'intuitive theories'. He has also developed several novel machine learning methods inspired by human learning and perception, most notably Isomap, an approach to unsupervised learning of nonlinear manifolds in high-dimensional data. He has been Associate Editor for the journal Cognitive Science, has been active on program committees for the CogSci and NIPS conferences, and has co-organized a number of workshops, tutorials and summer schools in human and machine learning. Several of his papers have received outstanding paper awards or best student paper awards at the IEEE Computer Vision and Pattern Recognition (CVPR), NIPS, and Cognitive Science conferences. He is the recipient of the New Investigator Award from the Society for Mathematical Psychology (2005), the Early Investigator Award from the Society of Experimental Psychologists (2007), and the Distinguished Scientific Award for Early Career Contribution to Psychology (in the area of cognition and human learning) from the American Psychological Association (2008).

Jiajun Wu (MIT)

Jiajun Wu is a fifth-year Ph.D. student at Massachusetts Institute of Technology, advised by Professor Bill Freeman and Professor Josh Tenenbaum. His research interests lie on the intersection of computer vision, machine learning, and computational cognitive science. Before coming to MIT, he received his B.Eng. from Tsinghua University, China, advised by Professor Zhuowen Tu. He has also spent time working at research labs of Microsoft, Facebook, and Baidu.

More from the Same Authors