Timezone: »
Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Prior evidence suggests that these models may learn to rely on “shortcut” biases or spurious correlations. Therefore, it is importance to ensure that ML models do not propagate biases found in training data. Further, collecting accurately labeled data can be very challenging and costly. In this work, we design algorithms for fair active learning that carefully selects data points to be labeled by exploiting their underlying causal structure so as to balance model accuracy and fairness. We look into a pool-based setup, where the learner has access to a small pool of labeled and a large pool of unlabelled data, both of which have the same biased distribution. We look at two cases of confounding bias where: a) bias is available b) bias is unknown or unavailable. For each class, we try to sample from interventional distribution to eliminate the effect of bias on the acquired data points. Exploiting the causal structure of the underlying data, the approach first involves expressing the interventional distribution as a simple weighted KDE to generate sampling weights. In each iteration, we generate weights for all labeled data samples and then batch sample unlabelled points, from kernels centered on labeled samples with probability w_n, ensuring diversity of the collected samples. We compare our method against the popular active learning baselines based on a) Uncertainty b) Density and c) Diversity. We also compare our method against models that implicity regularise for fairness while acquiring randomly or based on the entropy of the sample. We show that on the synthetically generated biased datasets, our method outperforms the baselines by a huge margin on unbiased test sets - implying that the model learned by acquiring actively based on the causal structure of the data is unbiased. We wish to further extend the results to large datasets and deep learning models.
Author Information
Sindhu C M Gowda (University of Toronto)
Haoran Zhang (Massachusetts Institute of Technology)
Marzyeh Ghassemi (MIT)
More from the Same Authors
-
2021 : Improving the Fairness of Deep Chest X-ray Classifiers »
Haoran Zhang · Natalie Dullerud · Karsten Roth · Stephen Pfohl · Marzyeh Ghassemi -
2022 : Multimodal Checklists for Fair Clinical Decision Support »
Qixuan Jin · Marzyeh Ghassemi -
2022 : Evaluating and Improving Robustness of Self-Supervised Representations to Spurious Correlations »
Kimia Hamidieh · Haoran Zhang · Marzyeh Ghassemi -
2022 : Learning to Defer in Ranking Systems »
Aparna Balagopalan · Haoran Zhang · Elizabeth Bondi-Kelly · Thomas Hartvigsen · Marzyeh Ghassemi -
2022 : Evaluation of Active Learning and Domain Adaptation on Health Data »
Kristina Holsapple · Haoran Zhang · Marzyeh Ghassemi -
2022 : "Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts »
Haoran Zhang · Harvineet Singh · Marzyeh Ghassemi · Shalmali Joshi -
2022 : Fair Multimodal Checklists for Interpretable Clinical Time Series Prediction »
Qixuan Jin · Haoran Zhang · Thomas Hartvigsen · Marzyeh Ghassemi -
2022 : Fair Multimodal Checklists for Interpretable Clinical Time Series Prediction »
Qixuan Jin · Haoran Zhang · Thomas Hartvigsen · Marzyeh Ghassemi -
2022 Workshop: Robustness in Sequence Modeling »
Nathan Ng · Haoran Zhang · Vinith Suriyakumar · Chantal Shaib · Kyunghyun Cho · Yixuan Li · Alice Oh · Marzyeh Ghassemi -
2021 : Data Opportunities: unsolved medical problems and where new data can help »
Bin Yu · Regina Barzilay · Marzyeh Ghassemi · Emma Pierson -
2021 Workshop: Machine learning from ground truth: New medical imaging datasets for unsolved medical problems. »
Katy Haynes · Ziad Obermeyer · Emma Pierson · Marzyeh Ghassemi · Matthew Lungren · Sendhil Mullainathan · Matthew McDermott -
2021 Poster: Learning Optimal Predictive Checklists »
Haoran Zhang · Quaid Morris · Berk Ustun · Marzyeh Ghassemi -
2020 Affinity Workshop: Muslims in ML »
Marzyeh Ghassemi · Mohammad Norouzi · Shakir Mohamed · Aya Salama · Tasmie Sarker -
2019 : Coffee Break and Poster Session »
Rameswar Panda · Prasanna Sattigeri · Kush Varshney · Karthikeyan Natesan Ramamurthy · Harvineet Singh · Vishwali Mhasawade · Shalmali Joshi · Laleh Seyyed-Kalantari · Matthew McDermott · Gal Yona · James Atwood · Hansa Srinivasan · Yonatan Halpern · D. Sculley · Behrouz Babaki · Margarida Carvalho · Josie Williams · Narges Razavian · Haoran Zhang · Amy Lu · Irene Y Chen · Xiaojie Mao · Angela Zhou · Nathan Kallus -
2019 : Poster Session I »
Shuangjia Zheng · Arnav Kapur · Umar Asif · Eyal Rozenberg · Cyprien Gilet · Oleksii Sidorov · Yogesh Kumar · Tom Van Steenkiste · William Boag · David Ouyang · Paul Jaeger · Sheng Liu · Aparna Balagopalan · Deepta Rajan · Marta Skreta · Nikhil Pattisapu · Jann Goschenhofer · Viraj Prabhu · Di Jin · Laura-Jayne Gardiner · Irene Li · sriram kumar · Qiyuan Hu · Mehul Motani · Justin Lovelace · Usman Roshan · Lucy Lu Wang · Ilya Valmianski · Hyeonwoo Lee · Sunil Mallya · Elias Chaibub Neto · Jonas Kemp · Marie Charpignon · Amber Nigam · Wei-Hung Weng · Sabri Boughorbel · Alexis Bellot · Lovedeep Gondara · Haoran Zhang · Taha Bahadori · John Zech · Rulin Shao · Edward Choi · Laleh Seyyed-Kalantari · Emily Aiken · Ioana Bica · Yiqiu Shen · Kieran Chin-Cheong · Subhrajit Roy · Ioana Baldini · So Yeon Min · Dirk Deschrijver · Pekka Marttinen · Damian Pascual Ortiz · Supriya Nagesh · Niklas Rindtorff · Andriy Mulyar · Katharina Hoebel · Martha Shaka · Pierre Machart · Leon Gatys · Nathan Ng · Matthias Hüser · Devin Taylor · Dennis Barbour · Natalia Martinez · Clara McCreery · Benjamin Eyre · Vivek Natarajan · Ren Yi · Ruibin Ma · Chirag Nagpal · Nan Du · Chufan Gao · Anup Tuladhar · Sam Shleifer · Jason Ren · Pouria Mashouri · Ming Yang Lu · Farideh Bagherzadeh-Khiabani · Olivia Choudhury · Maithra Raghu · Scott Fleming · Mika Jain · GUO YANG · Alena Harley · Stephen Pfohl · Elisabeth Rumetshofer · Alex Fedorov · Saloni Dash · Jacob Pfau · Sabina Tomkins · Colin Targonski · Michael Brudno · Xinyu Li · Yiyang Yu · Nisarg Patel -
2018 Workshop: Machine Learning for Health (ML4H): Moving beyond supervised learning in healthcare »
Andrew Beam · Tristan Naumann · Marzyeh Ghassemi · Matthew McDermott · Madalina Fiterau · Irene Y Chen · Brett Beaulieu-Jones · Michael Hughes · Farah Shamout · Corey Chivers · Jaz Kandola · Alexandre Yahi · Samuel Finlayson · Bruno Jedynak · Peter Schulam · Natalia Antropova · Jason Fries · Adrian Dalca · Irene Chen -
2017 Workshop: Machine Learning for Health (ML4H) - What Parts of Healthcare are Ripe for Disruption by Machine Learning Right Now? »
Jason Fries · Alex Wiltschko · Andrew Beam · Isaac S Kohane · Jasper Snoek · Peter Schulam · Madalina Fiterau · David Kale · Rajesh Ranganath · Bruno Jedynak · Michael Hughes · Tristan Naumann · Natalia Antropova · Adrian Dalca · SHUBHI ASTHANA · Prateek Tandon · Jaz Kandola · Uri Shalit · Marzyeh Ghassemi · Tim Althoff · Alexander Ratner · Jumana Dakka -
2016 Workshop: Machine Learning for Health »
Uri Shalit · Marzyeh Ghassemi · Jason Fries · Rajesh Ranganath · Theofanis Karaletsos · David Kale · Peter Schulam · Madalina Fiterau