Skip to yearly menu bar Skip to main content

Workshop: UniReps: Unifying Representations in Neural Models

Mixture of Multimodal Interaction Experts

Haofei Yu · Paul Pu Liang · Russ Salakhutdinov · Louis-Philippe Morency

[ ] [ Project Page ]
presentation: UniReps: Unifying Representations in Neural Models
Fri 15 Dec 6:15 a.m. PST — 3:15 p.m. PST


Multimodal machine learning, which studies the information and interactions across various input modalities, has made significant advancements in understanding the relationship between images and descriptive text. Yet, this is just a portion of the potential multimodal interactions in the real world, such as sarcasm in conflicting utterance and gestures. Notably, the current methods for capturing this shared information often don't extend well to these more nuanced interactions. Current models, in fact, show particular weaknesses with disagreement and synergistic interactions, sometimes performing as low as 50\% in binary classification. In this paper, we address this problem via a new approach called mixture of multimodal interaction experts. This method automatically classifies datapoints from unlabeled multimodal dataset by their intereaction types, then employs specialized models for each specific interaction. Based on our experiments, this approach has improved performance on these challenging interactions to more than 10%, leading to an overall increase of 2% for tasks like sarcasm prediction. As a result, not only does interaction quantification provide new insights for dataset analysis, but also simple approaches to obtain state-of-the-art performance.

Chat is not available.