Skip to yearly menu bar Skip to main content


Poster

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

Rachel S.Y. Teo · Tan Nguyen

East Exhibit Hall A-C #2003
[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learning models, the construction of these attention mechanisms relies on heuristics and experience. In our work, we derive self-attention from kernel principal component analysis (kernel PCA) and show that self-attention projects its query vectors onto the principal component axes of its key matrix in a feature space. We then formulate the exact formula for the value matrix in self-attention, theoretically and empirically demonstrating that this value matrix captures the eigenvectors of the Gram matrix of the key vectors in self-attention. Leveraging our kernel PCA framework, we propose Attention with Robust Principal Components (RPC-Attention), a novel class of robust attention that is resilient to data contamination. We empirically demonstrate the advantages of RPC-Attention over softmax attention on the ImageNet-1K object classification, WikiText-103 language modeling, and ADE20K image segmentation task.

Live content is unavailable. Log in and register to view live content