Timezone: »
We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks. Our proposed attention module can be trained with or without extra supervision, and gives a sizable boost in accuracy while keeping the network size and computational cost nearly the same. It leads to significant improvements over state of the art base architecture on three standard action recognition benchmarks across still images and videos, and establishes new state of the art on MPII (12.5% relative improvement) and HMDB (RGB) datasets. We also perform an extensive analysis of our attention module both empirically and analytically. In terms of the latter, we introduce a novel derivation of bottom-up and top-down attention as low-rank approximations of bilinear pooling methods (typically used for fine-grained classification). From this perspective, our attention formulation suggests a novel characterization of action recognition as a fine-grained recognition problem.
Author Information
Rohit Girdhar (Carnegie Mellon University)
Deva Ramanan (Carnegie Mellon University)
More from the Same Authors
-
2021 Spotlight: ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction »
Gengshan Yang · Deqing Sun · Varun Jampani · Daniel Vlasic · Forrester Cole · Ce Liu · Deva Ramanan -
2021 : Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting »
Benjamin Wilson · William Qi · Tanmay Agarwal · John Lambert · Jagjeet Singh · Siddhesh Khandelwal · Bowen Pan · Ratnesh Kumar · Andrew Hartnett · Jhony Kaesemodel Pontes · Deva Ramanan · Peter Carr · James Hays -
2021 : The CLEAR Benchmark: Continual LEArning on Real-World Imagery »
Zhiqiu Lin · Jia Shi · Deepak Pathak · Deva Ramanan -
2022 Poster: Continual Learning with Evolving Class Ontologies »
Zhiqiu Lin · Deepak Pathak · Yu-Xiong Wang · Deva Ramanan · Shu Kong -
2022 Poster: Learning to Discover and Detect Objects »
Vladimir Fomenko · Ismail Elezi · Deva Ramanan · Laura Leal-TaixĂ© · Aljosa Osep -
2021 Poster: ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction »
Gengshan Yang · Deqing Sun · Varun Jampani · Daniel Vlasic · Forrester Cole · Ce Liu · Deva Ramanan -
2021 Poster: NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild »
Jason Zhang · Gengshan Yang · Shubham Tulsiani · Deva Ramanan -
2019 Poster: Volumetric Correspondence Networks for Optical Flow »
Gengshan Yang · Deva Ramanan -
2017 Poster: Learning to Model the Tail »
Yu-Xiong Wang · Deva Ramanan · Martial Hebert