Affinity Workshop: Women in Machine Learning

Quantifying Gender Bias in Hindi Language Models

Neeraja Kirtane · V MANUSHREE · Aditya Kane


The gender bias present in the data on which language models are trained gets reflected in the systems that use these models. Therefore, it is important to address and mitigate the bias present in these models. While extensive research is being done in the English language for this, work in other languages especially the Indian languages is relatively nascent. English being a non-gendered language, the methodologies cannot be directly translated to other languages. Spoken by more than 600 million people, Hindi is the third most spoken language in the world. It is therefore essential to address bias in this language. In our paper, we measure gender bias associated with occupations in the Hindi language model. The major contributions are the creation of a corpus to evaluate gender bias in Hindi. Using this corpus, we evaluate the gender bias present in Hindi language models. Our results indicate a presence of bias in these systems.

Chat is not available.