Timezone: »

A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
Mateusz Malinowski · Mario Fritz

Tue Dec 09 04:00 PM -- 08:59 PM (PST) @ Level 2, room 210D #None

We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision. We combine discrete reasoning with uncertain predictions by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework. Our approach can handle human questions of high complexity about realistic scenes and replies with range of answer like counts, object classes, instances and lists of them. The system is directly trained from question-answer pairs. We establish a first benchmark for this task that can be seen as a modern attempt at a visual turing test.

Author Information

Mateusz Malinowski (DeepMind)

Mateusz Malinowski is a research scientist at DeepMind, where he works at the intersection of computer vision, natural language understanding, and deep learning. He was granted PhD (Dr.-Ing.) with the highest honor (summa cum laude) at Max Planck Institute for Informatics in 2017 in computer vision for his pioneering work on visual question answering, where he proposed the task and developed methods that answer questions about the content of images. Prior to this, he graduated with honors from Saarland University in computer science. Before that, he studied computer science at Wroclaw University in Poland.

Mario Fritz (CISPA Helmholtz Center i.G.)

More from the Same Authors