Skip to yearly menu bar Skip to main content

Affinity Workshop: WiML Workshop 1

Visual Question Answering (VQA) Models for Hypothetical Reasoning

Shailaja Keyur Sampat


In this work, we propose a novel vision-language question answering task for ‘what-if’ reasoning over images. We set up a synthetic corpus based on the CLEVR (Johnson et al., 2017a) dataset which is carefully crafted to ensure minimal biases, support explainable model development and yet diverse. We set up several baselines based on existing architectures to gain insights about their ability to perform hypothetical reasoning. In future, we would like to develop better vision-language models to tackle the hypothetical reasoning problem.

Chat is not available.