Timezone: »
Recently, there has been an increase in efforts to understand how large language models (LLMs) propagate and amplify social biases. Several works have utilized templates for fairness evaluation, which allow researchers to quantify social biases in the absence of test sets with protected attribute labels. While template evaluation can be a convenient and helpful diagnostic tool to understand model deficiencies, it often uses a simplistic and limited set of templates. In this paper, we study whether bias measurements are sensitive to the choice of templates used for benchmarking. Specifically, we investigate the instability of bias measurements by manually modifying templates proposed in previous works in a semantically-preserving manner and measuring bias across these modifications. We find that bias values and resulting conclusions vary considerably across template modifications on four tasks, ranging from an 81% reduction (NLI) to a 162% increase (MLM) in (task-specific) bias measurements. Our results indicate that quantifying fairness in LLMs, as done in current practice, can be brittle and needs to be approached with more care and caution.
Author Information
Preethi Seshadri (University of California, Irvine)
Pouya Pezeshkpour (University of California, Irvine)
Sameer Singh (University of California, Irvine)
Sameer Singh is an Assistant Professor at UC Irvine working on robustness and interpretability of machine learning. Sameer has presented tutorials and invited workshop talks at EMNLP, Neurips, NAACL, WSDM, ICLR, ACL, and AAAI, and received paper awards at KDD 2016, ACL 2018, EMNLP 2019, AKBC 2020, and ACL 2020. Website: http://sameersingh.org/
More from the Same Authors
-
2021 : Cutting Down on Prompts and Parameters:Simple Few-Shot Learning with Language Models »
Robert Logan · Ivana Balazevic · Eric Wallace · Fabio Petroni · Sameer Singh · Sebastian Riedel -
2022 : TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations »
Dylan Slack · Satyapriya Krishna · Himabindu Lakkaraju · Sameer Singh -
2023 Poster: Post Hoc Explanations of Language Models Can Improve Language Models »
Satyapriya Krishna · Jiaqi Ma · Dylan Slack · Asma Ghandeharioun · Sameer Singh · Himabindu Lakkaraju -
2022 : Contributed Talk: TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations »
Dylan Slack · Satyapriya Krishna · Himabindu Lakkaraju · Sameer Singh -
2021 : Cutting Down on Prompts and Parameters:Simple Few-Shot Learning with Language Models »
Robert Logan · Ivana Balazevic · Eric Wallace · Fabio Petroni · Sameer Singh · Sebastian Riedel -
2021 : PYLON: A PyTorch Framework for Learning with Constraints »
Kareem Ahmed · Tao Li · Nu Mai Thy Ton · Quan Guo · Kai-Wei Chang · Parisa Kordjamshidi · Vivek Srikumar · Guy Van den Broeck · Sameer Singh -
2020 Tutorial: (Track2) Explaining Machine Learning Predictions: State-of-the-art, Challenges, and Opportunities Q&A »
Himabindu Lakkaraju · Julius Adebayo · Sameer Singh -
2020 Tutorial: (Track2) Explaining Machine Learning Predictions: State-of-the-art, Challenges, and Opportunities »
Himabindu Lakkaraju · Julius Adebayo · Sameer Singh -
2019 Workshop: KR2ML - Knowledge Representation and Reasoning Meets Machine Learning »
Veronika Thost · Christian Muise · Kartik Talamadupula · Sameer Singh · Christopher RĂ© -
2019 Demonstration: AllenNLP Interpret: Explaining Predictions of NLP Models »
Jens Tuyls · Eric Wallace · Matt Gardner · Junlin Wang · Sameer Singh · Sanjay Subramanian -
2017 : Poster Session - Session 2 »
Ambrish Rawat · Armand Joulin · Peter A Jansen · Jay Yoon Lee · Muhao Chen · Frank F. Xu · Patrick Verga · Brendan Juba · Anca Dumitrache · Sharmistha Jat · Robert Logan · Dhanya Sridhar · Fan Yang · Rajarshi Das · Pouya Pezeshkpour · Nicholas Monath