`

Timezone: »

Poster
Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding
Yang Li · Si Si · Gang Li · Cho-Jui Hsieh · Samy Bengio

Thu Dec 09 04:30 PM -- 06:00 PM (PST) @ None #None
Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we propose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represent each position, which can be multi-dimensional, as a trainable encoding based on learnable Fourier feature mapping, modulated with a multi-layer perceptron. The representation is particularly advantageous for a spatial multi-dimensional position, e.g., pixel positions on an image, where $L_2$ distances or more complex positional relationships need to be captured. Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for multi-dimensional positional encoding outperforms existing methods by both improving the accuracy and allowing faster convergence.

#### Author Information

Yang Li is a Senior Staff Research Scientist at Google, and an affiliate faculty member at the University of Washington CSE, focusing on the area intersecting AI and HCI. He pioneered on-device interactive ML on Android by developing impactful product features such as next app prediction and Gesture Search. Yang has extensively published in top venues across both the HCI and ML fields, including CHI, UIST, ICML, ACL, EMNLP, CVPR, NeurIPS (NIPS), ICLR, and KDD, and has constantly served as area chairs or senior area (track) chairs across the fields. Yang is also an editor of the upcoming Springer book on "AI for HCI: A Modern Approach", which is the first thorough treatment of the topic.