Group Convolutional Self-Attention for Roto-Translation Equivariance in ViTs
Sheir A. Zaheer · Alexander Holston · Chan Youn Park
Abstract
We propose discrete roto-translation group equivariant self-attention without position encoding using convolutional patch embedding and convolutional self-attention. We examine the challenges involved in achieving equivariance in vision transformers, and propose a simpler way to implement discretized roto-translation group equivariant vision transformers (ViTs). The experimental results demonstrate the competitive performance of our approach in comparison to the existing approaches for developing roto-translation equivariant ViTs.
Chat is not available.
Successful Page Load