Timezone: »
Poster
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick · Dylan Campbell · Yuki Asano · Ishan Misra · Florian Metze · Christoph Feichtenhofer · Andrea Vedaldi · João Henriques
In video transformers, the time dimension is often treated in the same way as the two spatial dimensions. However, in a scene where objects or the camera may move, a physical point imaged at one location in frame $t$ may be entirely unrelated to what is found at that location in frame $t+k$. These temporal correspondences should be modeled to facilitate learning about dynamic scenes. To this end, we propose a new drop-in block for video transformers - trajectory attention - that aggregates information along implicitly determined motion paths. We additionally propose a new method to address the quadratic dependence of computation and memory on the input size, which is particularly important for high resolution or long videos. While these ideas are useful in a range of settings, we apply them to the specific task of video action recognition with a transformer model and obtain state-of-the-art results on the Kinetics, Something-Something V2, and Epic-Kitchens datasets.
Author Information
Mandela Patrick (University of Oxford)
Dylan Campbell (University of Oxford)
Yuki Asano (University of Amsterdam)
Ishan Misra (Facebook AI Research)
Florian Metze (Carnegie Mellon University)
Christoph Feichtenhofer (Facebook AI Research)
Andrea Vedaldi (University of Oxford / Facebook AI Research)
João Henriques (University of Oxford)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers »
Wed. Dec 8th 04:20 -- 04:35 PM Room
More from the Same Authors
-
2021 : PASS: An ImageNet replacement for self-supervised pretraining without humans »
Yuki Asano · Christian Rupprecht · Andrew Zisserman · Andrea Vedaldi -
2021 : PASS: An ImageNet replacement for self-supervised pretraining without humans »
Yuki Asano · Christian Rupprecht · Andrew Zisserman · Andrea Vedaldi -
2022 : Self-Guided Diffusion Model »
TAO HU · David Zhang · Yuki Asano · Gertjan Burghouts · Cees Snoek -
2023 Poster: Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion »
Yash Bhalgat · Iro Laina · João Henriques · Andrea Vedaldi · Andrew Zisserman -
2023 Poster: Extracting Reward Functions from Diffusion Models »
Felipe Nuti · Tim Franzmeyer · João Henriques -
2023 Workshop: 4th Workshop on Self-Supervised Learning: Theory and Practice »
Tengda Han · Ishan Misra · Pengtao Xie · Mathilde Caron · Hilde Kuehne -
2023 Workshop: Causal Representation Learning »
Sara Magliacane · Atalanti Mastakouri · Yuki Asano · Claudia Shi · Cian Eastwood · Sébastien Lachapelle · Bernhard Schölkopf · Caroline Uhler -
2022 Workshop: Self-Supervised Learning: Theory and Practice »
Ishan Misra · Pengtao Xie · Gul Varol · Yale Song · Yuki Asano · Xiaolong Wang · Pauline Luc -
2022 Poster: Learn what matters: cross-domain imitation learning with task-relevant embeddings »
Tim Franzmeyer · Philip Torr · João Henriques -
2022 Poster: A Data-Augmentation Is Worth A Thousand Samples: Analytical Moments And Sampling-Free Training »
Randall Balestriero · Ishan Misra · Yann LeCun -
2022 Poster: Masked Autoencoders that Listen »
Po-Yao Huang · Hu Xu · Juncheng Li · Alexei Baevski · Michael Auli · Wojciech Galuba · Florian Metze · Christoph Feichtenhofer -
2021 Workshop: 2nd Workshop on Self-Supervised Learning: Theory and Practice »
Pengtao Xie · Ishan Misra · Pulkit Agrawal · Abdelrahman Mohamed · Shentong Mo · Youwei Liang · Jeannette Bohg · Kristina N Toutanova -
2021 Workshop: The pre-registration workshop: an alternative publication model for machine learning research »
Samuel Albanie · João Henriques · Luca Bertinetto · Alex Hernandez-Garcia · Hazel Doughty · Gul Varol -
2021 Poster: Unsupervised Part Discovery from Contrastive Reconstruction »
Subhabrata Choudhury · Iro Laina · Christian Rupprecht · Andrea Vedaldi -
2021 Poster: Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models »
Hannah Rose Kirk · Yennie Jun · Filippo Volpin · Haider Iqbal · Elias Benussi · Frederic Dreyer · Aleksandar Shtedritski · Yuki Asano -
2020 Workshop: Self-Supervised Learning -- Theory and Practice »
Pengtao Xie · Shanghang Zhang · Pulkit Agrawal · Ishan Misra · Cynthia Rudin · Abdelrahman Mohamed · Wenzhen Yuan · Barret Zoph · Laurens van der Maaten · Xingyi Yang · Eric Xing -
2020 Workshop: The pre-registration experiment: an alternative publication model for machine learning research »
Luca Bertinetto · João Henriques · Samuel Albanie · Michela Paganini · Gul Varol -
2020 Poster: Continuous Surface Embeddings »
Natalia Neverova · David Novotny · Marc Szafraniec · Vasil Khalidov · Patrick Labatut · Andrea Vedaldi -
2020 Poster: Labelling unlabelled videos from scratch with multi-modal self-supervision »
Yuki Asano · Mandela Patrick · Christian Rupprecht · Andrea Vedaldi -
2020 Poster: Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction »
David Novotny · Roman Shapovalov · Andrea Vedaldi -
2020 Poster: 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data »
Benjamin Biggs · David Novotny · Sebastien Ehrhardt · Hanbyul Joo · Ben Graham · Andrea Vedaldi -
2020 Spotlight: 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data »
Benjamin Biggs · David Novotny · Sebastien Ehrhardt · Hanbyul Joo · Ben Graham · Andrea Vedaldi -
2019 Poster: Adversarial Music: Real world Audio Adversary against Wake-word Detection System »
Juncheng Li · Shuhui Qu · Xinjian Li · Joseph Szurley · J. Zico Kolter · Florian Metze -
2019 Spotlight: Adversarial Music: Real world Audio Adversary against Wake-word Detection System »
Juncheng Li · Shuhui Qu · Xinjian Li · Joseph Szurley · J. Zico Kolter · Florian Metze -
2019 Poster: Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels »
Natalia Neverova · David Novotny · Andrea Vedaldi -
2016 : Spotlights and Posters »
Florian Metze -
2016 : Florian Metze: End-to-end learning for language universal speech recognition »
Florian Metze -
2016 Poster: Spatiotemporal Residual Networks for Video Action Recognition »
Christoph Feichtenhofer · Axel Pinz · Richard Wildes -
2016 Poster: Learning feed-forward one-shot learners »
Luca Bertinetto · João Henriques · Jack Valmadre · Philip Torr · Andrea Vedaldi -
2014 Poster: Fast Training of Pose Detectors in the Fourier Domain »
João Henriques · Pedro Martins · Rui F Caseiro · Jorge Batista