Benchmarking Parameter Efficient Adaptation of Vision Language Models on Pathology
Abstract
Generalist vision–language models (VLMs) struggle on histopathology tasks due to domain gaps and scarce labels. Pathology VLMs (PFMs) also fall short despite costly pretraining. Parameter-efficient fine-tuning (PEFT) offers a scalable lightweight approach to quickly adapt large pretrained models to target histopathology tasks. We present the first benchmark of PEFT methods when applied to VLMs/PFMs for histopathology tasks. We categorize existing PEFT methods based on adaptation modality, strategy and locus. We curate a novel neuropathology dataset for detecting neurofibrillary tangles (NFTs), a hallmark of Alzheimer's Disease, capturing annotator variability to evaluate reliability and alignment. Experiments across prostate cancer, colorectal cancer, and neuropathology tasks show that with full data, PEFT-adapted generalist VLMs rival adapted PFMs, but fall short in few shot settings due to label scarcity, terminology mismatch, and modality-specific biases. Visualization further reveals that models such as CONCH+MMRL focus on NFT within annotated boxes, improving interpretability in single-NFT cases, but their performance diminishes in complex multi-NFT scenarios. Together, our benchmark and dataset highlight PEFT as a scalable strategy, but also indicate the need for richer interpretability metrics and improved multimodal reasoning to handle complex cases.