Timezone: »
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near state-of-the-art results. Our experimental results suggest that UViM is a promising candidate for a unified modeling approach in computer vision.
Author Information
Alexander Kolesnikov (Google Research, Brain team)
André Susano Pinto (Google)
Lucas Beyer (Google Brain Zürich)
Xiaohua Zhai (Google Brain)
Jeremiah Harmsen (Google Brain)
Neil Houlsby (Google)
More from the Same Authors
-
2021 : A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches »
Vincent Dumoulin · Neil Houlsby · Utku Evci · Xiaohua Zhai · Ross Goroshin · Sylvain Gelly · Hugo Larochelle -
2022 Panel: Panel 2C-4: UViM: A Unified… & K-LITE: Learning Transferable… »
Chunyuan Li · André Susano Pinto -
2022 : Panel »
Erin Grant · Richard Turner · Neil Houlsby · Priyanka Agrawal · Abhijeet Awasthi · Salomey Osei -
2022 Poster: Revisiting Neural Scaling Laws in Language and Vision »
Ibrahim Alabdulmohsin · Behnam Neyshabur · Xiaohua Zhai -
2022 Poster: Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts »
Basil Mustafa · Carlos Riquelme · Joan Puigcerver · Rodolphe Jenatton · Neil Houlsby -
2021 : Live panel: Did we solve ImageNet? »
Shibani Santurkar · Alexander Kolesnikov · Becca Roelofs -
2021 : Are we done with ImageNet? »
Alexander Kolesnikov -
2021 Workshop: ImageNet: Past, Present, and Future »
Zeynep Akata · Lucas Beyer · Sanghyuk Chun · A. Sophia Koepke · Diane Larlus · Seong Joon Oh · Rafael Rezende · Sangdoo Yun · Xiaohua Zhai -
2021 Poster: MLP-Mixer: An all-MLP Architecture for Vision »
Ilya Tolstikhin · Neil Houlsby · Alexander Kolesnikov · Lucas Beyer · Xiaohua Zhai · Thomas Unterthiner · Jessica Yung · Andreas Steiner · Daniel Keysers · Jakob Uszkoreit · Mario Lucic · Alexey Dosovitskiy -
2021 Poster: Scaling Vision with Sparse Mixture of Experts »
Carlos Riquelme · Joan Puigcerver · Basil Mustafa · Maxim Neumann · Rodolphe Jenatton · André Susano Pinto · Daniel Keysers · Neil Houlsby -
2021 Poster: Revisiting the Calibration of Modern Neural Networks »
Matthias Minderer · Josip Djolonga · Rob Romijnders · Frances Hubis · Xiaohua Zhai · Neil Houlsby · Dustin Tran · Mario Lucic