Skip to yearly menu bar Skip to main content


Poster

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Hassan Akbari ⋅ Liangzhe Yuan ⋅ Rui Qian ⋅ Wei-Hong Chuang ⋅ Shih-Fu Chang ⋅ Yin Cui ⋅ Boqing Gong
2021 Poster

Abstract

Video

Chat is not available.