Timezone: »
The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories. However, it is labor intensive to temporally annotate audio and visual events and thus hampers the learning of a parsing model. To this end, we propose to explore additional cross-video and cross-modality supervisory signals to facilitate weakly-supervised audio-visual video parsing. The proposed method exploits both the common and diverse event semantics across videos to identify audio or visual events. In addition, our method explores event co-occurrence across audio, visual, and audio-visual streams. We leverage the explored cross-modality co-occurrence to localize segments of target events while excluding irrelevant ones. The discovered supervisory signals across different videos and modalities can greatly facilitate the training with only video-level annotations. Quantitative and qualitative results demonstrate that the proposed method performs favorably against existing methods on weakly-supervised audio-visual video parsing.
Author Information
Yan-Bo Lin (University of North Carolina at Chapel Hill)
Hung-Yu Tseng (Facebook)
Hsin-Ying Lee (University of California, Merced)
Yen-Yu Lin (National Chiao Tung University)
Ming-Hsuan Yang (Google / UC Merced)
More from the Same Authors
-
2021 Spotlight: Intriguing Properties of Vision Transformers »
Muhammad Muzammal Naseer · Kanchana Ranasinghe · Salman H Khan · Munawar Hayat · Fahad Shahbaz Khan · Ming-Hsuan Yang -
2023 Poster: ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections »
Chun-Han Yao · Amit Raj · Wei-Chih Hung · Michael Rubinstein · Yuanzhen Li · Ming-Hsuan Yang · Varun Jampani -
2023 Poster: A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence »
Junyi Zhang · Charles Herrmann · Junhwa Hur · Luisa Polania Cabrera · Varun Jampani · Deqing Sun · Ming-Hsuan Yang -
2023 Poster: AIMS: All-Inclusive Multi-Level Segmentation »
Lu Qi · Jason Kuen · Weidong Guo · Jiuxiang Gu · Zhe Lin · Bo Du · Yu Xu · Ming-Hsuan Yang -
2023 Poster: Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection »
Cheng-Ju Ho · Chen-Hsuan Tai · Yen-Yu Lin · Ming-Hsuan Yang · Yi-Hsuan Tsai -
2023 Poster: Module-wise Adaptive Distillation for Multimodality Foundation Models »
Chen Liang · Jiahui Yu · Ming-Hsuan Yang · Matthew Brown · Yin Cui · Tuo Zhao · Boqing Gong · Tianyi Zhou -
2023 Poster: SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs »
Lijun Yu · Yong Cheng · Zhiruo Wang · Vivek Kumar · Wolfgang Macherey · Yanping Huang · David Ross · Irfan Essa · Yonatan Bisk · Ming-Hsuan Yang · Kevin Murphy · Alexander Hauptmann · Lu Jiang -
2023 Poster: Video Timeline Modeling For News Story Understanding »
Meng Liu · Mingda Zhang · Jialu Liu · Hanjun Dai · Ming-Hsuan Yang · Shuiwang Ji · Zheyun Feng · Boqing Gong -
2022 Poster: LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery »
Chun-Han Yao · Wei-Chih Hung · Yuanzhen Li · Michael Rubinstein · Ming-Hsuan Yang · Varun Jampani -
2021 Poster: Intriguing Properties of Vision Transformers »
Muhammad Muzammal Naseer · Kanchana Ranasinghe · Salman H Khan · Munawar Hayat · Fahad Shahbaz Khan · Ming-Hsuan Yang -
2021 Poster: Learning 3D Dense Correspondence via Canonical Point Autoencoder »
An-Chieh Cheng · Xueting Li · Min Sun · Ming-Hsuan Yang · Sifei Liu -
2021 Poster: End-to-end Multi-modal Video Temporal Grounding »
Yi-Wen Chen · Yi-Hsuan Tsai · Ming-Hsuan Yang -
2020 Poster: Online Adaptation for Consistent Mesh Reconstruction in the Wild »
Xueting Li · Sifei Liu · Shalini De Mello · Kihwan Kim · Xiaolong Wang · Ming-Hsuan Yang · Jan Kautz -
2019 Poster: Quadratic Video Interpolation »
Xiangyu Xu · Li Siyao · Wenxiu Sun · Qian Yin · Ming-Hsuan Yang -
2019 Spotlight: Quadratic Video Interpolation »
Xiangyu Xu · Li Siyao · Wenxiu Sun · Qian Yin · Ming-Hsuan Yang -
2019 Poster: Joint-task Self-supervised Learning for Temporal Correspondence »
Xueting Li · Sifei Liu · Shalini De Mello · Xiaolong Wang · Jan Kautz · Ming-Hsuan Yang -
2019 Poster: Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior »
Cheng-Chun Hsu · Kuang-Jui Hsu · Chung-Chi Tsai · Yen-Yu Lin · Yung-Yu Chuang -
2019 Poster: Dancing to Music »
Hsin-Ying Lee · Xiaodong Yang · Ming-Yu Liu · Ting-Chun Wang · Yu-Ding Lu · Ming-Hsuan Yang · Jan Kautz -
2018 Poster: Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation »
Wenqi Ren · Jiawei Zhang · Lin Ma · Jinshan Pan · Xiaochun Cao · Wangmeng Zuo · Wei Liu · Ming-Hsuan Yang -
2018 Poster: Context-aware Synthesis and Placement of Object Instances »
Donghoon Lee · Sifei Liu · Jinwei Gu · Ming-Yu Liu · Ming-Hsuan Yang · Jan Kautz -
2018 Poster: Deep Attentive Tracking via Reciprocative Learning »
Shi Pu · YIBING SONG · Chao Ma · Honggang Zhang · Ming-Hsuan Yang -
2017 Poster: Learning Affinity via Spatial Propagation Networks »
Sifei Liu · Shalini De Mello · Jinwei Gu · Guangyu Zhong · Ming-Hsuan Yang · Jan Kautz -
2017 Poster: Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks »
Wei-Sheng Lai · Jia-Bin Huang · Ming-Hsuan Yang -
2017 Poster: Universal Style Transfer via Feature Transforms »
Yijun Li · Chen Fang · Jimei Yang · Zhaowen Wang · Xin Lu · Ming-Hsuan Yang -
2015 Poster: Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis »
Jimei Yang · Scott E Reed · Ming-Hsuan Yang · Honglak Lee -
2008 Poster: Dimensionality Reduction for Data in Multiple Feature Representations »
Yen-Yu Lin · Tyng-Luh Liu · Chiou-Shann Fuh