Timezone: »
With the emergence of varied visual navigation tasks (e.g., image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: can we learn a single powerful agent that masters not one but multiple navigation tasks concurrently? First, we propose VXN, a large-scale 3D dataset that instantiates~four classic navigation tasks in standardized, continuous, and audiovisual-rich environments. Second, we propose Vienna, a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model. Building upon a full-attentive architecture, Vienna formulates various navigation tasks as a unified, parse-and-query procedure: the target description, augmented with four task embeddings, is comprehensively interpreted into a set of diversified goal vectors, which are refined as the navigation progresses, and used as queries to retrieve supportive context from episodic history for decision making. This enables the reuse of knowledge across navigation tasks with varying input domains/modalities. We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.
Author Information
Hanqing Wang (Beijing Institute of Technology)
Wei Liang (Beijing Institute of Technology)
Luc V Gool (Computer Vision Lab, ETH Zurich)
Wenguan Wang (University of Technology Sydney)
Currently I am a Lecturer and ARC DECRA Fellow at the ReLER lab@University of Technology Sydney. My research interests lie in the intersection of computer vision, artificial intelligence, and cognition. The ultimate goal of my research is to develop a machine that can perceive, reason, and plan in real-world scenes like humans.
More from the Same Authors
-
2019 Poster: Gated CRF Loss for Weakly Supervised Semantic Image Segmentation »
Anton Obukhov · Stamatios Georgoulis · Dengxin Dai · Luc V Gool -
2021 : Spatial-Temporal Gated Transformersfor Efficient Video Processing »
Yawei Li · Babak Ehteshami Bejnordi · Bert Moons · Tijmen Blankevoort · Amirhossein Habibian · Radu Timofte · Luc V Gool -
2022 Poster: HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes »
Zan Wang · Yixin Chen · Tengyu Liu · Yixin Zhu · Wei Liang · Siyuan Huang -
2022 Poster: Recurrent Video Restoration Transformer with Guided Deformable Attention »
Jingyun Liang · Yuchen Fan · Xiaoyu Xiang · Rakesh Ranjan · Eddy Ilg · Simon Green · Jiezhang Cao · Kai Zhang · Radu Timofte · Luc V Gool -
2023 Poster: LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer »
Haoyu Chen · Hao Tang · Radu Timofte · Luc V Gool · Guoying Zhao -
2023 Poster: Conan: Active Reasoning in an Open-World Environment »
Manjie Xu · Guangyuan Jiang · Wei Liang · Chi Zhang · Yixin Zhu -
2023 Poster: Autodecoding Latent 3D Diffusion Models »
Evangelos Ntavelis · Aliaksandr Siarohin · Kyle Olszewski · Chaoyang Wang · Luc V Gool · Sergey Tulyakov -
2023 Poster: Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding »
Zhejun Zhang · Alexander Liniger · Christos Sakaridis · Fisher Yu · Luc V Gool -
2023 Poster: ClusterFomer: Clustering As A Universal Visual Learner »
James Liang · Yiming Cui · Qifan Wang · Tong Geng · Wenguan Wang · Dongfang Liu -
2023 Poster: LoTR: Logic-Guided Transformer Reasoner for Human-Object Interaction Detection »
Liulei Li · Jianan Wei · Wenguan Wang · Yi Yang -
2023 Poster: IVRE: Interactive Visual Reasoning under Uncertainty »
Manjie Xu · Guangyuan Jiang · Wei Liang · Chi Zhang · Yixin Zhu -
2023 Poster: Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union »
Zifu Wang · Maxim Berman · Amal Rannen-Triki · Philip Torr · Devis Tuia · Tinne Tuytelaars · Luc V Gool · Jiaqian Yu · Matthew Blaschko -
2022 Spotlight: Lightning Talks 5A-4 »
Yangrui Chen · Zhiyang Chen · Liang Zhang · Hanqing Wang · Jiaqi Han · Shuchen Wu · shaohui peng · Ganqu Cui · Yoav Kolumbus · Noemi Elteto · Xing Hu · Anwen Hu · Wei Liang · Cong Xie · Lifan Yuan · Noam Nisan · Wenbing Huang · Yousong Zhu · Ishita Dasgupta · Luc V Gool · Tingyang Xu · Rui Zhang · Qin Jin · Zhaowen Li · Meng Ma · Bingxiang He · Yangyi Chen · Juncheng Gu · Wenguan Wang · Ke Tang · Yu Rong · Eric Schulz · Fan Yang · Wei Li · Zhiyuan Liu · Jiaming Guo · Yanghua Peng · Haibin Lin · Haixin Wang · Qi Yi · Maosong Sun · Ruizhi Chen · Chuan Wu · Chaoyang Zhao · Yibo Zhu · Liwei Wu · xishan zhang · Zidong Du · Rui Zhao · Jinqiao Wang · Ling Li · Qi Guo · Ming Tang · Yunji Chen -
2022 Spotlight: Towards Versatile Embodied Navigation »
Hanqing Wang · Wei Liang · Luc V Gool · Wenguan Wang -
2022 Spotlight: Lightning Talks 4B-4 »
Ziyue Jiang · Zeeshan Khan · Yuxiang Yang · Chenze Shao · Yichong Leng · Zehao Yu · Wenguan Wang · Xian Liu · Zehua Chen · Yang Feng · Qianyi Wu · James Liang · C.V. Jawahar · Junjie Yang · Zhe Su · Songyou Peng · Yufei Xu · Junliang Guo · Michael Niemeyer · Hang Zhou · Zhou Zhao · Makarand Tapaswi · Dongfang Liu · Qian Yang · Torsten Sattler · Yuanqi Du · Haohe Liu · Jing Zhang · Andreas Geiger · Yi Ren · Long Lan · Jiawei Chen · Wayne Wu · Dahua Lin · Dacheng Tao · Xu Tan · Jinglin Liu · Ziwei Liu · 振辉 叶 · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: Learning Equivariant Segmentation with Instance-Unique Querying »
Wenguan Wang · James Liang · Dongfang Liu -
2022 Spotlight: Recurrent Video Restoration Transformer with Guided Deformable Attention »
Jingyun Liang · Yuchen Fan · Xiaoyu Xiang · Rakesh Ranjan · Eddy Ilg · Simon Green · Jiezhang Cao · Kai Zhang · Radu Timofte · Luc V Gool -
2022 Spotlight: Lightning Talks 1A-4 »
Siwei Wang · Jing Liu · Nianqiao Ju · Shiqian Li · Eloïse Berthier · Muhammad Faaiz Taufiq · Arsene Fansi Tchango · Chen Liang · Chulin Xie · Jordan Awan · Jean-Francois Ton · Ziad Kobeissi · Wenguan Wang · Xinwang Liu · Kewen Wu · Rishab Goel · Jiaxu Miao · Suyuan Liu · Julien Martel · Ruobin Gong · Francis Bach · Chi Zhang · Rob Cornish · Sanmi Koyejo · Zhi Wen · Yee Whye Teh · Yi Yang · Jiaqi Jin · Bo Li · Yixin Zhu · Vinayak Rao · Wenxuan Tu · Gaetan Marceau Caron · Arnaud Doucet · Xinzhong Zhu · Joumana Ghosn · En Zhu -
2022 Spotlight: GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models »
Chen Liang · Wenguan Wang · Jiaxu Miao · Yi Yang -
2022 Poster: I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification »
Muhammad Ferjad Naeem · Yongqin Xian · Luc V Gool · Federico Tombari -
2022 Poster: Learning Equivariant Segmentation with Instance-Unique Querying »
Wenguan Wang · James Liang · Dongfang Liu -
2022 Poster: GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models »
Chen Liang · Wenguan Wang · Jiaxu Miao · Yi Yang -
2022 Poster: Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging »
Yuanhao Cai · Jing Lin · Haoqian Wang · Xin Yuan · Henghui Ding · Yulun Zhang · Radu Timofte · Luc V Gool -
2021 Poster: Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations »
Wouter Van Gansbeke · Simon Vandenhende · Stamatios Georgoulis · Luc V Gool -
2020 Poster: GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network »
Prune Truong · Martin Danelljan · Luc V Gool · Radu Timofte -
2020 Poster: Soft Contrastive Learning for Visual Localization »
Janine Thoma · Danda Pani Paudel · Luc V Gool -
2017 Poster: Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations »
Eirikur Agustsson · Fabian Mentzer · Michael Tschannen · Lukas Cavigelli · Radu Timofte · Luca Benini · Luc V Gool -
2016 Poster: Dynamic Filter Networks »
Xu Jia · Bert De Brabandere · Tinne Tuytelaars · Luc V Gool -
2014 Poster: Quantized Kernel Learning for Feature Matching »
Danfeng Qin · Xuanli Chen · Matthieu Guillaumin · Luc V Gool -
2014 Poster: Self-Adaptable Templates for Feature Coding »
Xavier Boix · Gemma Roig · Salomon Diether · Luc V Gool -
2011 Poster: Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities »
Angela Yao · Juergen Gall · Luc V Gool · Raquel Urtasun