Skip to yearly menu bar Skip to main content

Workshop: Socially Responsible Language Modelling Research (SoLaR)

Interpretable Stereotype Identification through Reasoning

Jacob-Junqi Tian · Omkar Dige · David B. Emerson · Faiza Khattak


Given that language models are trained on vast datasets that may contain inherent biases, there is a potential danger of inadvertently perpetuating systemic discrimination. Consequently, it becomes essential to examine and address biases in language models, integrating fairness into their development to ensure that these models are equitable and free of bias. In this work, we demonstrate the importance of reasoning in zero-shot stereotype identification based on Vicuna-13B & -33B and LLaMA-2-chat-13B & -70B. Although we observe improved accuracy by scaling from 13B to larger models, we show that the performance gain from reasoning significantly exceeds the gain from scaling up. Our findings suggest that reasoning is a key factor that enables LLMs to transcend the scaling law on out-of-domain tasks such as stereotype identification. Additionally, through a qualitative analysis of select reasoning traces, we highlight how reasoning improves not just accuracy, but also the interpretability of the decision.

Chat is not available.