Timezone: »

Towards Versatile Embodied Navigation
Hanqing Wang · Wei Liang · Luc V Gool · Wenguan Wang

Tue Nov 29 09:00 AM -- 11:00 AM (PST) @ Hall J #900

With the emergence of varied visual navigation tasks (e.g., image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: can we learn a single powerful agent that masters not one but multiple navigation tasks concurrently? First, we propose VXN, a large-scale 3D dataset that instantiates~four classic navigation tasks in standardized, continuous, and audiovisual-rich environments. Second, we propose Vienna, a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model. Building upon a full-attentive architecture, Vienna formulates various navigation tasks as a unified, parse-and-query procedure: the target description, augmented with four task embeddings, is comprehensively interpreted into a set of diversified goal vectors, which are refined as the navigation progresses, and used as queries to retrieve supportive context from episodic history for decision making. This enables the reuse of knowledge across navigation tasks with varying input domains/modalities. We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.

Author Information

Hanqing Wang (Beijing Institute of Technology)
Wei Liang (Beijing Institute of Technology)
Luc V Gool (Computer Vision Lab, ETH Zurich)
Wenguan Wang (University of Technology Sydney)

Currently I am a Lecturer and ARC DECRA Fellow at the ReLER lab@University of Technology Sydney. My research interests lie in the intersection of computer vision, artificial intelligence, and cognition. The ultimate goal of my research is to develop a machine that can perceive, reason, and plan in real-world scenes like humans.

More from the Same Authors