Mechanistic Interpretability of Semantic Abstraction in Biomedical Texts
Nikhil Gourisetty · Snata Mohanty · Vishnu Srinivas · Soumil Jain · Sunith Vallabhaneni · Kevin Zhu · Sunishchal Dev
Abstract
We investigate whether biomedical language models create register-invariant semantic representations of sentences: a cognitive ability that allows consistent and reliable clinical communication across different language styles. Using aligned sentence pairs (technical vs. plain language abstracts that mean the same thing), we analyze how BioBERT, SciBERT, Clinical-T5, and BioGPT react to varying registers through similarity measures, trajectory visualization, and activation patching. Results show models converge to shared semantic states in mid-to-late layers through internal processes that preserve meaning across stylistic variation.
Video
Chat is not available.
Successful Page Load