Vision Transformers: Theory and applications

Fahad Shahbaz Khan · Gul Varol · Salman Khan · Ping Luo · Rao Anwer · Ashish Vaswani · Hisham Cholakkal · Niki Parmar · Joost van de Weijer · Mubarak Shah



Transformer models have demonstrated excellent performance on a diverse set of computer vision applications ranging from classification to segmentation on various data modalities such as images, videos, and 3D data. The goal of this workshop is to bring together computer vision and machine learning researchers working towards advancing the theory, architecture, and algorithmic design for vision transformer models, as well as the practitioners utilizing transformer models for novel applications and use cases.

The workshop’s motivation is to narrow the gap between the research advancements in transformer designs and applications utilizing transformers for various computer vision applications. The workshop also aims to widen the adaptation of transformer models for various vision-related industrial applications. We are interested in papers reporting their experimental results on the utilization of transformers for any application of computer vision, challenges they have faced, and their mitigation strategy on topics like, but not limited to image classification, object detection, segmentation, human-object interaction detection, scene understanding based on 3D, video, and multimodal inputs.

Live content is unavailable. Log in and register to view live content

Timezone: America/Los_Angeles »


Log in and register to view live content