Timezone: »

 
Poster
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
Chengzhi Lin · Ancong Wu · Junwei Liang · Jun Zhang · Wenhang Ge · Wei-Shi Zheng · Chunhua Shen

@

Cross-modal retrieval between videos and texts has gained increasing interest because of the rapid emergence of videos on the web. Generally, a video contains rich instance and event information and the query text only describes a part of the information. Thus, a video can have multiple different text descriptions and queries. We call it the Video-Text Correspondence Ambiguity problem. Current techniques mostly concentrate on mining local or multi-level alignment between contents of video and text (e.g., object to entity and action to verb). It is difficult for these methods to alleviate video-text correspondence ambiguity by describing a video using only one feature, which is required to be matched with multiple different text features at the same time. To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching Model. It automatically captures multiple prototypes to describe a video by adaptive aggregation on video token features. Given a query text, the similarity is determined by the most similar prototype to find correspondence in the video, which is called text-adaptive matching. To learn diverse prototypes for representing the rich information in videos, we propose a variance loss to encourage different prototypes to attend to different contents of the video. Our method outperforms the state-of-the-art methods on four public video retrieval datasets.

Author Information

Chengzhi Lin (SUN YAT-SEN UNIVERSITY)
Ancong Wu (SUN YAT-SEN UNIVERSITY)
Junwei Liang (Hong Kong University of Science and Technology (Guangzhou))
Junwei Liang

I am an assistant professor at The Hong Kong University of Science and Technology (Guangzhou campus) in the AI Thrust. I am interested in building AI systems that can understand and predict human behaviors. I received my Ph.D. from CMU. Please see these [projects](https://junweiliang.me/projects.html#projects) for an overview. My mission: develop AI technologies for social good.

Jun Zhang (Tencent Youtu Lab)
Wenhang Ge (SUN YAT-SEN UNIVERSITY)
Wei-Shi Zheng (SUN YAT-SEN UNIVERSITY)
Chunhua Shen (University of Adelaide)

More from the Same Authors