Skip to yearly menu bar Skip to main content

Workshop: Workshop on robustness of zero/few-shot learning in foundation models (R0-FoMo)

Flexible visual prompts for in context learning in computer vision

Thomas Foster · Ioana Croitoru · Robert Dorfman · Christoffer Edlund · Thomas Varsavsky · Jon Almazan


In this work, we address in-context learning (ICL) for computer vision, introducing a novel approach that adapts a modern Video Object Segmentation (VOS) technique for visual ICL. This adaptation is inspired by the VOS methods' ability to efficiently and flexibly learn objects from a few examples. Through evaluations across a range of support set sizes and on diverse segmentation datasets, our method consistently surpasses existing techniques. Notably, it excels with data containing classes not encountered during training. Additionally, we propose a technique for support set selection that enhances the performance of all tested ICL methods. We plan to release all code for this study prior to publication.

Chat is not available.