Timezone: »

MAQA: A Multimodal QA Benchmark for Negation
Yue Li · Aren Jansen · Qingqing Huang · Ravi Ganti · Joonseok Lee · Dima Kuzmin

Fri Dec 02 08:32 AM -- 08:34 AM (PST) @
Event URL: https://openreview.net/forum?id=THVTj5ZwYu »

Multimodal learning can benefit from the representation power of pretrained Large Language Models (LLMs). However, state-of-the-art transformer based LLMs often ignore negations in natural language and there is no existing benchmark to quantitatively evaluate whether multimodal transformers inherit this weakness. In this study, we present a new multimodal question answering (QA) benchmark adapted from labeled music videos in AudioSet (Gemmeke et al., 2017) with the goal of systematically evaluating if multimodal transformers can perform complex reasoning to recognize new concepts as negation of previously learned concepts. We show that with standard fine-tuning approach multimodal transformers are still incapable of correctly interpreting negation irrespective of model size. However, our experiments demonstrate that augmenting the original training task distributions with negated QA examples allow the model to reliably reason with negation. To do this, we describe a novel data generation procedure that prompts the 540B-parameter PaLM model to automatically generate negated QA examples as compositions of easily accessible video tags. The generated examples contain more natural linguistic patterns and the gains compared to template-based task augmentation approach are significant.

Author Information

Yue Li (Google Research)

NLPer with research interest in few shot learning, domain adaptation and joint music-language learning

Aren Jansen (Google, Inc.)
Qingqing Huang (MIT)
Ravi Ganti (Google)
Joonseok Lee (Google Research)

Joonseok Lee is a research engineer at Google Research. He is mainly working on content-based video recommendation and multi-modal video representation learning. He earned his Ph. D. in Computer Science from Georgia Institute of Technology in August 2015, under the supervision of Dr. Guy Lebanon and Prof. Hongyuan Zha. His thesis is about local approaches for collaborative filtering, with recommendation systems as the main application. He has done three internships during Ph.D, including Amazon (2014 Summer), Microsoft Research (2014 Spring), and Google (2013 Summer). Before coming to Georgia Tech, he worked in NHN corp. in Korea (2007-2010). He received his B.S degree in computer science and engineering from Seoul National University, Korea. His paper "Local Collaborative Ranking" received the best student paper award from the ACM WWW (2014) and IEEE ICDM (2016) conference. He co-organized the YouTube-8M Large-Scale Video Understanding Workshop as a program chair since 2017, and served as the publicity chair for AISTATS 2015 conference. He has served as a program committee in many conferences including NIPS, ICML, ICLR, AAAI, CVPR, I/ECCV, WSDM, and CIKM, and journals including JMLR, ACM TIST, and IEEE TKDE. More information is available in his website (http://www.joonseok.net).

Dima Kuzmin (Google)

More from the Same Authors