Poster
in
Workshop: Algorithmic Fairness through the Lens of Time

Algorithmic Fairness Reproducibility: A Close Look at Data Usage over the Years

Jan Simson ⋅ Alessandro Fabris ⋅ Christoph Kern

[ Poster]

Abstract

Algorithmic fairness has become a significant area of research in recent years, with a growing body of work aimed at addressing bias and discrimination in machine learning systems. Like many other fields, however, this expanding field is faced with the challenge of ensuring the reliability and reproducibility of research findings. This is particularly the case with respect to the datasets used, as it has been shown that a large body of research in the field is built upon a small number of datasets. Moreover, many of these popular datasets have been found to have significant flaws and recent large-scale studies have shown alarming signs of low reproducibility.In this work-in-progress, we examine the landscape of dataset usage in algorithmic fairness research, shedding light on the practices employed and the challenges encountered. We present an ongoing investigation into the use of tabular datasets within algorithmic fairness research.Our preliminary results point to a two-fold reproducility issue. On the one hand, data are not uniformly preprocessed. The same basic datasets are often used quite differently, with researchers performing different processing on columns (especially columns used as protected attributes), using different subsets of the data or different columns for prediction tasks. On the other hand, many papers do not provide enough information to reliably identify not only which preprocessing steps, but also which specific resource has been used, as popular datasets such as the widely used COMPAS dataset come with different subsets of data.We encourage the adoption of practices from the open science community, such as sharing of code to improve transparency, reproducibility and comparability of research in algorithmic fairness. To ease this process we plan to release a Python package for standardized loading and preprocessing of datasets for use in the fairness literature.

Chat is not available.