Timezone: »
Videos often capture objects, their motion, and the interactions between different objects. Although real-world objects have physical properties associated with them, many of these properties (such as mass and coefficient of friction) are not captured directly by the imaging pipeline. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce a new video question answering task for reasoning about the implicit physical properties of objects in a scene, from videos. For this task, we introduce a dataset -- CRIPP-VQA, which contains videos of objects in motion, annotated with hypothetical/counterfactual questions about the effect of actions (such as removing, adding, or replacing objects), questions about planning (choosing actions to perform in order to reach a particular goal), as well as descriptive questions about the visible properties of objects. We benchmark the performance of existing deep learning based video question answering models on CRIPP-VQA (Counterfactual Reasoning about Implicit Physical Properties). Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties (the focus of prior work) of objects (as shown in Table 1).
Author Information
Maitreya Patel (Arizona State University)
Tejas Gokhale (Arizona State University)

Website: https://www.tejasgokhale.com/ # On the job market! ## Final-year Ph.D. candidate at Arizona State University, co-advised by Yezhou Yang and Chitta Baral. M.S. ECE, Carnegie Mellon University, 2017. Research focuses on **semantic vision**, with special emphasis on developing **robust and reliable** systems by leveraging the complex interactions between **vision and language**.
Chitta Baral (Arizona State University)
'YZ' Yezhou Yang (Arizona State University)
More from the Same Authors
-
2022 : LILA: A Unified Benchmark for Mathematical Reasoning »
Swaroop Mishra · Matthew Finlayson · Pan Lu · Leonard Tang · Sean Welleck · Chitta Baral · Tanmay Rajpurohit · Oyvind Tafjord · Ashish Sabharwal · Peter Clark · Ashwin Kalyan -
2022 : Covariate Shift Detection via Domain Interpolation Sensitivity »
Tejas Gokhale · Joshua Feinglass · 'YZ' Yezhou Yang -
2020 : VAIDA: An Educative Benchmark Creation Paradigm using Visual Analytics for Interactively Discouraging Artifacts (by Anjana Arunkumar, Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral and Chris Bryan) »
Anjana Arunkumar · Swaroop Mishra · Chitta Baral -
2020 Poster: Language-Conditioned Imitation Learning for Robot Manipulation Tasks »
Simon Stepputtis · Joseph Campbell · Mariano Phielipp · Stefan Lee · Chitta Baral · Heni Ben Amor -
2020 Spotlight: Language-Conditioned Imitation Learning for Robot Manipulation Tasks »
Simon Stepputtis · Joseph Campbell · Mariano Phielipp · Stefan Lee · Chitta Baral · Heni Ben Amor