Workshop

Vision Transformers: Theory and applications

Fahad Shahbaz Khan ⋅ Gul Varol ⋅ Salman Khan ⋅ Ping Luo ⋅ Rao Anwer ⋅ Ashish Vaswani ⋅ Hisham Cholakkal ⋅ Niki Parmar ⋅ Joost van de Weijer ⋅ Mubarak Shah

Project Page [ Contact: visiontransformer.neurips@gmail.com ]

Abstract

Transformer models have demonstrated excellent performance on a diverse set of computer vision applications ranging from classification to segmentation on various data modalities such as images, videos, and 3D data. The goal of this workshop is to bring together computer vision and machine learning researchers working towards advancing the theory, architecture, and algorithmic design for vision transformer models, as well as the practitioners utilizing transformer models for novel applications and use cases.

The workshop’s motivation is to narrow the gap between the research advancements in transformer designs and applications utilizing transformers for various computer vision applications. The workshop also aims to widen the adaptation of transformer models for various vision-related industrial applications. We are interested in papers reporting their experimental results on the utilization of transformers for any application of computer vision, challenges they have faced, and their mitigation strategy on topics like, but not limited to image classification, object detection, segmentation, human-object interaction detection, scene understanding based on 3D, video, and multimodal inputs.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

11:00 PM

Opening Remarks

11:10 PM

[First Invited Talk] Ming Hsuan Yang

11:40 PM

CLUDA : Contrastive Learning in Unsupervised Domain Adaptation for Semantic Segmentation

Midhun Vayyat ⋅ Kasi Jaswin ⋅ Anuraag Bhattacharya ⋅ Shuaib Ahmed ⋅ Rahul Tallamraju

11:40 PM

[1st] Oral Presentation

11:55 PM

PatchBlender: A Motion Prior for Video Transformers

Gabriele Prato ⋅ Yale Song ⋅ Janarthanan Rajendran ⋅ R Devon Hjelm ⋅ Neel Joshi ⋅ Sarath Chandar

12:10 AM

Bi-Directional Self-Attention for Vision Transformers

George Stoica ⋅ Taylor Hearn ⋅ Bhavika Devnani ⋅ Judy Hoffman

12:25 AM

Video based Object 6D Pose Estimation using Transformers

Apoorva Beedu ⋅ Huda Alamri ⋅ Irfan Essa

12:40 AM

End-to-end Multimodal Representation Learning for Video Dialog

Huda Alamri ⋅ Apoorva Beedu ⋅ Irfan Essa ⋅ Anthony Bilic ⋅ Michael Hu

12:55 AM

Continual Transformers: Redundancy-Free Attention for Online Inference

Lukas Hedegaard ⋅ Arian Bakhtiarnia ⋅ Alexandros Iosifidis

1:10 AM

Break

1:40 AM

On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition

Farrukh Rahman ⋅ Ömer Mubarek ⋅ Zsolt Kira

1:40 AM

Fully-attentive and interpretable: vision and video vision transformers for pain detection

Giacomo Fiorentini ⋅ Itir Onal Ertugrul ⋅ Albert Ali Salah

1:40 AM

DynamicViT: Making Vision Transformer faster through layer skipping

Amanuel Mersha ⋅ Samuel Assefa

1:40 AM

FQDet: Fast-converging Query-based Detector

Cédric Picron ⋅ Punarjay Chakravarty ⋅ Tinne Tuytelaars

1:40 AM

[1st] Poster session

2:30 AM

[2nd Invited Talk] Cordelia Schmid

3:00 AM

[3rd Invited Talk] Rita Cucchiara

3:30 AM

Matryoshka Representations for Adaptive Deployment

Aniket Rege ⋅ Aditya Kusupati ⋅ Gantavya Bhatt ⋅ Matthew Wallingford ⋅ Aditya Sinha ⋅ Vivek Ramanujan ⋅ William Howard-Snyder ⋅ Kaifeng Chen ⋅ Sham Kakade ⋅ Prateek Jain ⋅ Ali Farhadi

3:30 AM

[2nd] Oral Presentation

3:45 AM

TPFNet: A Novel Text In-painting Transformer for Text Removal

Onkar Susladkar ⋅ Dhruv Makwana ⋅ Gayatri Deshmukh ⋅ Sparsh Mittal ⋅ Sai Chandra Teja R ⋅ Rekha Singhal

4:00 AM

[4th Invited Talk] Kristen Grauman

4:30 AM

[5th Invited Talk] Laura Leal-Taixé

5:00 AM

Coffee Break

5:10 AM