Implicit Bias in LLMs for Transgender Populations
Abstract
Large language models (LLMs) have been shown to exhibit biases against LGBTQ+ population, with heightened consequences in healthcare. While safety training might mitigate the generation of offensive content, stereotype-driven associations can persist. In this work, we examine implicit bias toward transgender people with two evaluations. First, we adapt word association tests to measure whether LLMs disproportionately pair negative concepts with transgender'' and positive concepts withcisgender'' across eight categories. Second, we design a medical appointment allocation task where models act as scheduling agents choosing between cisgender and transgender candidates for medical specialties usually associated with a specific sex assigned at birth. Across six closed-source models, we observe consistent positive bias scores in categories such as appearance, risk and veracity, indicating stronger negative associations with transgender individuals. In the scheduling experiment, transgender candidates are favored for STI and mental health services, while cisgender candidates are preferred in gynecology and breast care. Our findings highlight the need for developing evaluation frameworks and mitigation strategies that address subtle, stereotype-driven biases in LLMs to ensure equitable treatment of transgender people, particularly in healthcare applications.