Timezone: »
We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. Unlabeled video has the advantage that it can be economically acquired at massive scales, yet contains useful signals about natural sound. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual recognition models into the sound modality using unlabeled video as a bridge. Our sound representation yields significant performance improvements over the state-of-the-art results on standard benchmarks for acoustic scene/object classification. Visualizations suggest some high-level semantics automatically emerge in the sound network, even though it is trained without ground truth labels.
Author Information
Yusuf Aytar (MIT)
Carl Vondrick (MIT)
Antonio Torralba (MIT)
More from the Same Authors
-
2020 Poster: Debiased Contrastive Learning »
Ching-Yao Chuang · Joshua Robinson · Yen-Chen Lin · Antonio Torralba · Stefanie Jegelka -
2020 Spotlight: Debiased Contrastive Learning »
Ching-Yao Chuang · Joshua Robinson · Yen-Chen Lin · Antonio Torralba · Stefanie Jegelka -
2018 Poster: Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding »
Kexin Yi · Jiajun Wu · Chuang Gan · Antonio Torralba · Pushmeet Kohli · Josh Tenenbaum -
2018 Poster: 3D-Aware Scene Manipulation via Inverse Graphics »
Shunyu Yao · Tzu Ming Hsu · Jun-Yan Zhu · Jiajun Wu · Antonio Torralba · Bill Freeman · Josh Tenenbaum -
2018 Spotlight: Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding »
Kexin Yi · Jiajun Wu · Chuang Gan · Antonio Torralba · Pushmeet Kohli · Josh Tenenbaum -
2016 Poster: Generating Videos with Scene Dynamics »
Carl Vondrick · Hamed Pirsiavash · Antonio Torralba -
2015 Poster: Skip-Thought Vectors »
Jamie Kiros · Yukun Zhu · Russ Salakhutdinov · Richard Zemel · Raquel Urtasun · Antonio Torralba · Sanja Fidler -
2015 Poster: Where are they looking? »
Adria Recasens · Aditya Khosla · Carl Vondrick · Antonio Torralba -
2015 Spotlight: Where are they looking? »
Adria Recasens · Aditya Khosla · Carl Vondrick · Antonio Torralba -
2015 Poster: Learning visual biases from human imagination »
Carl Vondrick · Hamed Pirsiavash · Aude Oliva · Antonio Torralba -
2011 Poster: Video Annotation and Tracking with Active Learning »
Carl Vondrick · Deva Ramanan