A Bilevel Optimization Approach for Computing Synthetic Data to Mitigate Unfairness in Collaborative Machine Learning
Abstract
In distributed computing, collaborative machine learning enables multiple clients to train a global model jointly. In this work, we present a framework for addressing fairness in collaborative machine learning through constrained optimization, where each client generates a synthetic dataset by solving a bilevel optimization problem, with the outer problem incorporating fairness constraints to guide dataset generation such that the resulting global model yields fair predictions. In our proposed strategy, clients pass their synthetic datasets---or closely related versions that additionally preserve differential privacy---to the server. These datasets are then used to train the global model using conventional machine learning techniques, eliminating the need for fairness-specific aggregation. This approach requires only a single communication round, maintains data privacy, and promotes fairness. Empirical results demonstrate that our one-shot method effectively reduces unfairness.