Skip to yearly menu bar Skip to main content

Workshop: Socially Responsible Language Modelling Research (SoLaR)

FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs

Swanand Kadhe · Anisa Halimi · Ambrish Rawat · Nathalie Baracaldo


Training large language models (LLMs) is a costly endeavour in terms of time and computational resources. The large amount of training data used during the unsupervised pre-training phase makes it difficult to verify all data and, unfortunately, undesirable data may be ingested during training. Re-training from scratch is impractical and has led to the creation of the \textit{unlearning} discipline where models are modified to ``unlearn" undesirable information without retraining. However, any modification can alter the behaviour of LLMs, especially on key dimensions such as \textit{fairness}. This is the first work that examines this interplay between unlearning and fairness for LLMs. In particular, we focus on a popular unlearning framework known as SISA [Bourtoule et al., 2021], which creates an ensemble of models trained on disjoint shards. We evaluate the performance-fairness trade-off for SISA, and empirically demsontrate that SISA can indeed reduce fairness in LLMs. To remedy this, we propose post-processing bias mitigation techniques for ensemble models produced by SISA. Through experimental results, we demonstrate the efficacy of our post-processing framework called \textit{FairSISA}.

Chat is not available.