Timezone: »
Convolution has been arguably the most important feature transform for modern neural networks, leading to the advance of deep learning. Recent emergence of Transformer networks, which replace convolution layers with self-attention blocks, has revealed the limitation of stationary convolution kernels and opened the door to the era of dynamic feature transforms. The existing dynamic transforms, including self-attention, however, are all limited for video understanding where correspondence relations in space and time, i.e., motion information, are crucial for effective representation. In this work, we introduce a relational feature transform, dubbed the relational self-attention (RSA), that leverages rich structures of spatio-temporal relations in videos by dynamically generating relational kernels and aggregating relational contexts. Our experiments and ablation studies show that the RSA network substantially outperforms convolution and self-attention counterparts, achieving the state of the art on the standard motion-centric benchmarks for video action recognition, such as Something-Something-V1&V2, Diving48, and FineGym.
Author Information
Manjin Kim (POSTECH)
Heeseung Kwon (POSTECH)
CHUNYU WANG (Peking University)
Suha Kwak (POSTECH)
Minsu Cho (POSTECH)
More from the Same Authors
-
2020 : Combinatorial 3D Shape Generation via Sequential Assembly »
Jungtaek Kim · Hyunsoo Chung · Jinhwi Lee · Minsu Cho · Jaesik Park -
2022 : SeLCA: Self-Supervised Learning of Canonical Axis »
Seungwook Kim · Yoonwoo Jeong · Chunghyun Park · Jaesik Park · Minsu Cho -
2023 Poster: Active Learning for Semantic Segmentation with Multi-class Label Query »
SEHYUN HWANG · Sohyun Lee · Hoyoung Kim · Minhyeon Oh · Jungseul Ok · Suha Kwak -
2023 Poster: Activity Grammars for Temporal Action Segmentation »
Dayoung Kong · Joonseok Lee · Deunsol Jung · Suha Kwak · Minsu Cho -
2023 Poster: Locality-Aware Generalizable Implicit Neural Representation »
Doyup Lee · Chiheon Kim · Minsu Cho · WOOK SHIN HAN -
2022 Poster: Learning Debiased Classifier with Biased Committee »
Nayeong Kim · SEHYUN HWANG · Sungsoo Ahn · Jaesik Park · Suha Kwak -
2022 Poster: PeRFception: Perception using Radiance Fields »
Yoonwoo Jeong · Seungjoo Shin · Junha Lee · Chris Choy · Anima Anandkumar · Minsu Cho · Jaesik Park -
2022 Poster: Peripheral Vision Transformer »
Juhong Min · Yucheng Zhao · Chong Luo · Minsu Cho -
2022 Poster: Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer »
Doyup Lee · Chiheon Kim · Saehoon Kim · Minsu Cho · WOOK SHIN HAN -
2021 Poster: Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning »
Hyunsoo Chung · Jungtaek Kim · Boris Knyazev · Jinhwi Lee · Graham Taylor · Jaesik Park · Minsu Cho -
2021 Poster: Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training »
Minguk Kang · Woohyeon Shim · Minsu Cho · Jaesik Park -
2020 Poster: CircleGAN: Generative Adversarial Learning across Spherical Circles »
Woohyeon Shim · Minsu Cho -
2019 Poster: Mining GOLD Samples for Conditional GANs »
Sangwoo Mo · Chiheon Kim · Sungwoong Kim · Minsu Cho · Jinwoo Shin