Skip to yearly menu bar Skip to main content

Workshop: I Can’t Believe It’s Not Better (ICBINB): Failure Modes in the Age of Foundation Models

Can Segment Anything Model Improve Semantic Segmentation?

Maryam Qamar · Chaoning Zhang · Donghoon Kim · Muhammad Salman Ali · Sung-Ho Bae


Recently, Segment Anything Model (SAM) has gained considerable attention in the field of computer vision establishing itself as a pioneering foundation model for segmentation. Notably, SAM excels in generating high-quality segmentation masks, yet it lacks in semantic labels. In contrast, conventional semantic segmentation models generate rather accurate semantic labels but often produce suboptimal segmentation masks. The notion of leveraging SAM's superior mask quality to enhance the performance of conventional semantic segmentation models appears intuitive. However, our preliminary experiments reveal that the integration of SAM with these models does not result in any discernible improvement. Specifically, when assessing the performance of SAM's integration into two baseline semantic segmentation models, DeepLab and OneFormer, we find no significant enhancements in the mean Intersection over Union (mIoU) on the Pascal VOC and ade20k datasets. Consequently, we conclude that, as it stands, the highly acclaimed foundational model is not the preferred solution for the semantic segmentation task. Instead, a more cautious and thoughtful approach is imperative to unlock any potential benefits in this context.

Chat is not available.