Skip to yearly menu bar Skip to main content


Poster

FUNGI: Features From Unsupervised Gradients

Walter Simoncini · Andrei Bursuc · Spyridon Gidaris · Yuki M Asano

East Exhibit Hall A-C #2203
[ ] [ Project Page ]
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: This paper introduces FUNGI: Features from UNsupervised GradIents, a method to enhance the features of vision transformers by leveraging self-supervised gradients. Our method is simple: given a pretrained model, we first compute gradients from various self-supervised objectives for each input. These are projected to a lower dimension and then concatenated with the model's embedding. We evaluate our method on k-nearest neighbor classification over 11 datasets from vision and 5 datasets from NLP. Across backbones spanning various sizes and pretraining strategies, FUNGI features provide consistent performance improvements over the embeddings. We also demonstrate that our method can be used to significantly improve the retrieval-based in-context scene understanding abilities of pretrained models, e.g. for semantic segmentation using a memory bank of $1024 \times 10^2$ patches, we improve upon DINO ViT-B/16 by +17% -- without any training.

Live content is unavailable. Log in and register to view live content