Timezone: »

Workshop
Trustworthy and Socially Responsible Machine Learning
Huan Zhang · Linyi Li · Chaowei Xiao · J. Zico Kolter · Anima Anandkumar · Bo Li

Fri Dec 09 06:45 AM -- 04:15 PM (PST) @ Virtual

To address these negative societal impacts of ML, researchers have looked into different principles and constraints to ensure trustworthy and socially responsible machine learning systems. This workshop makes the first attempt towards bridging the gap between security, privacy, fairness, ethics, game theory, and machine learning communities and aims to discuss the principles and experiences of developing trustworthy and socially responsible machine learning systems. The workshop also focuses on how future researchers and practitioners should prepare themselves for reducing the risks of unintended behaviors of sophisticated ML models.

This workshop aims to bring together researchers interested in the emerging and interdisciplinary field of trustworthy and socially responsible machine learning from a broad range of disciplines with different perspectives to this problem. We attempt to highlight recent related work from different communities, clarify the foundations of trustworthy machine learning, and chart out important directions for future work and cross-community collaborations.

 Fri 7:00 a.m. - 7:30 a.m. Invited Talk: Aleksander Mądry (Invited Talk) Aleksander Madry 🔗 Fri 7:30 a.m. - 7:45 a.m. (Tentative) Revisiting Robustness in Graph Machine Learning (Contributed Talk) Lukas Gosch · Daniel Sturm · Simon Geisler · Stephan Günnemann 🔗 Fri 7:45 a.m. - 8:00 a.m. (Tentative) TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations (Contributed Talk) Dylan Slack · Satyapriya Krishna · Himabindu Lakkaraju · Sameer Singh 🔗 Fri 8:00 a.m. - 8:30 a.m. Invited Talk: Milind Tambe (Invited Talk) Milind Tambe 🔗 Fri 8:30 a.m. - 8:40 a.m. Coffee Break I (Break) 🔗 Fri 8:40 a.m. - 9:30 a.m. Morning Poster Session (Poster Session) 🔗 Fri 9:30 a.m. - 10:00 a.m. Invited Talk: Nika Haghtalab (Invited Talk) Nika Haghtalab 🔗 Fri 10:00 a.m. - 10:30 a.m. Invited Talk: Kamalika Chauduri (Invited Talk) Kamalika Chaudhuri 🔗 Fri 10:30 a.m. - 11:00 a.m. Invited Talk: Been Kim (Invited Talk) Been Kim 🔗 Fri 11:00 a.m. - 12:00 p.m. Lunch Break (Break) 🔗 Fri 12:00 p.m. - 12:30 p.m. Invited Talk: Yi Ma (Invited Talk) Yi Ma 🔗 Fri 12:30 p.m. - 1:00 p.m. Invited Talk: Dorsa Sadigh (Invited Talk) Dorsa Sadigh 🔗 Fri 1:00 p.m. - 1:30 p.m. Invited Talk: Marco Pavone (Invited Talk) Marco Pavone 🔗 Fri 1:30 p.m. - 1:40 p.m. (Tentative) DensePure: Understanding Diffusion Models towards Adversarial Robustness (Contributed Talk) Zhongzhu Chen · Kun Jin · Jiongxiao Wang · Weili Nie · Mingyan Liu · Anima Anandkumar · Bo Li · Dawn Song 🔗 Fri 1:40 p.m. - 2:30 p.m. Afternoon Poster Session (Poster Session) 🔗 Fri 2:30 p.m. - 2:45 p.m. (Tentative) Controllable Attack and Improved Adversarial Training in Multi-Agent Reinforcement Learning (Contributed Talk) Xiangyu Liu · Souradip Chakraborty · Furong Huang 🔗 Fri 2:45 p.m. - 3:00 p.m. (Tentative) Differentially Private Bias-Term only Fine-tuning of Foundation Models (Contributed Talk) Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis 🔗 Fri 3:00 p.m. - 3:15 p.m. (Tentative) zPROBE: Zero Peek Robustness Checks for Federated Learning (Contributed Talk) Zahra Ghodsi · Mojan Javaheripi · Nojan Sheybani · Xinqiao Zhang · Ke Huang · Farinaz Koushanfar 🔗 Fri 3:15 p.m. - 4:00 p.m. Panel Discussion Kamalika Chaudhuri · Been Kim · Dorsa Sadigh · Huan Zhang · Linyi Li 🔗 Fri 4:00 p.m. - 4:15 p.m. Closing Remarks (Closing) Huan Zhang · Linyi Li 🔗 - Improving Fairness in Image Classification via Sketching (Poster) []  []  []   link »    Fairness is a fundamental requirement for trustworthy and human-centered Artificial Intelligence (AI) system. However, deep neural networks (DNNs) tend to make unfair predictions when the training data are collected from different sub-populations with different attributes (i.e. color, sex, age), leading to biased DNN predictions. We notice that such a troubling phenomenon is often caused by data itself, which means that bias information is encoded to the DNN along with the useful information (i.e. class information, semantic information). Therefore, we propose to use sketching to handle this phenomenon. Without losing the utility of data, we explore the image-to-sketching methods that can maintain the useful semantic information for the targeted classification while filtering out the useless bias information. In addition, we design a fair loss for further improving the model fairness. We evaluate our method on extensive experiments on both general scene dataset and medical scene dataset. Our results show that the desired image-to-sketching method improves model fairness and achieves satisfactory results among state-of-the-art (SOTA). Our code would be released based on acceptance. Link » Ruichen Yao · cui ziteng · Xiaoxiao Li · Lin Gu 🔗 - Inferring Class Label Distribution of Training Data from Classifiers: An Accuracy-Augmented Meta-Classifier Attack (Poster) []  []  []   link »    Property inference attacks against machine learning (ML) models aim to infer properties of the training data that are unrelated to the primary task of the model, and have so far been formulated as binary decision problems, i.e., whether or not the training data have a certain property. However, in industrial and healthcare applications, the proportion of labels in the training data is quite often also considered sensitive information. In this paper we introduce a new type of property inference attack that unlike binary decision problems in literature, aim at inferring the class label distribution of the training data from parameters of ML classifier models.We propose a method based on shadow training and a meta-classifier trained on the parameters of the shadow classifiers augmented with the accuracy of the classifiers on auxiliary data. We evaluate the proposed approach for ML classifiers with fully connected neural network architectures. We find that the proposed meta-classifier attack provides a maximum relative improvement of $52\%$ over state of the art. Link » Raksha Ramakrishna · György Dán 🔗 - Case Study: Applying Decision Focused Learning in the Real World (Poster) []  []  []   link »    Many real world optimization problems with unknown parameters are solved using the predict-then-optimize framework where a learnt model predicts the parameters of an optimization problem which is subsequently solved using an optimization algorithm.However, this approach maximises for the predictive accuracy rather than the quality of the final solution. Decision Focused Learning (DFL) solves this objective mismatch by integrating the optimization problem in the learning pipeline. Previous works have only shown the applicability of DFL in simulation setting. In our work, we consider the optimization problem of scheduling limited live service calls in Maternal and Child Health Awareness Programs and model it using Restless Multi-Armed Bandits (RMAB).We present results from a large-scale field study consisting of 9000 beneficiaries and demonstrate that DFL cuts $\sim 200\%$ more call engagement drops as compared to previous methods. Through detailed post-hoc analysis, we show that high predictive accuracy of problem parameters is not sufficient to ensure a well-performing system. We also demonstrate that DFL makes optimal decision choices by learning a better decision boundary between the RMAB actions, and by correctly predicting parameterswhich contribute most to the final decision outcome. Link » Shresth Verma · Aditya Mate · Kai Wang · Aparna Taneja · Milind Tambe 🔗 - Assessing Performance and Fairness Metrics in Face Recognition - Bootstrap Methods (Poster) []  []  []   link »    The ROC curve is the major tool for assessing not only the performance but also the fairness properties of a similarity scoring function in Face Recognition. In order to draw reliable conclusions based on empirical ROC analysis, evaluating accurately the uncertainty related to statistical versions of the ROC curves of interest is necessary. For this purpose, we explain in this paper that, because the True/False Acceptance Rates are of the form of U-statistics in the case of similarity scoring, the naive bootstrap approach is not valid here and that a dedicated recentering technique must be used instead. This is illustrated on real data of face images, when applied to several ROC-based metrics such as popular fairness metrics. Link » Jean-Rémy Conti · Stéphan Clémençon 🔗 - Generating Intuitive Fairness Specifications for Natural Language Processing (Poster) []  []  []   link »    Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in toxicity classification. We also show how limited amounts of human feedback can be leveraged to learn a similarity specification. Link » Florian E. Dorner · Momchil Peychev · Nikola Konstantinov · Naman Goel · Elliott Ash · Martin Vechev 🔗 - Learning from uncertain concepts via test time interventions (Poster) []  []   link »    With neural networks applied to safety-critical applications, it has become increasingly important to understand the defining features of decision-making. Therefore, the need to uncover the black boxes to rational representational space of these neural networks is apparent. Concept bottleneck model (CBM) encourages interpretability by predicting human-understandable concepts. They predict concepts from input images and then labels from concepts. Test time intervention, a salient feature of CBM, allows for human-model interactions. However, these interactions are prone to information leakage and can often be ineffective inappropriate communication with humans. We propose a novel uncertainty based strategy, \emph{SIUL: Single Interventional Uncertainty Learning} to select the interventions. Additionally, we empirically test the robustness of CBM and the effect of SIUL interventions under adversarial attack and distributional shift. Using SIUL, we observe that the interventions suggested lead to meaningful corrections along with mitigation of concept leakage. Extensive experiments on three vision datasets along with a histopathology dataset validate the effectiveness of our interventional learning. Link » Ivaxi Sheth · Aamer Abdul Rahman · Laya Rafiee Sevyeri · Mohammad Havaei · Samira Ebrahimi Kahou 🔗 - Towards Reasoning-Aware Explainable VQA (Poster) []  []  []   link »    The domain of joint vision-language understanding, especially in the context of reasoning in Visual Question Answering (VQA) models, has garnered significant attention in the recent past. While most of the existing VQA models focus on improving the accuracy of VQA, the way models arrive at an answer is oftentimes a black box. As a step towards making the VQA task more explainable and interpretable, our method is built upon the SOTA VQA framework by augmenting it with an end-to-end explanation generation module. In this paper, we investigate two network architectures, including LSTM and Transformer decoder, as the explanation generator. Our method generates human-readable explanations while maintaining SOTA VQA accuracy on the GQA-REX (77.49%) and VQA-E (71.48%) datasets. Approximately 65.16% of the generated explanations are approved to be valid by humans. Roughly 60.5% of the generated explanations are valid and lead to the correct answers. Link » Rakesh Vaideeswaran · Feng Gao · ABHINAV MATHUR · Govindarajan Thattai 🔗 - Beyond Protected Attributes: Disciplined Detection of Systematic Deviations in Data (Poster) []  []  []   link »    Finding systematic deviations of an outcome of interest in data and models is an important goal of trustworthy and socially responsible AI. To understand systematic deviations at a subgroup level, it is important to look beyond \emph{predefined} groups and consider all possible subgroups for analysis. Of course this exhaustive enumeration is not possible and there needs to be a balance of exploratory and confirmation analysis in socially-responsible AI. In this paper we compare recently proposed methods for detecting systematic deviations in an outcome of interest at the subgroup level across three socially-relevant data sets. Furthermore, we show the importance of looking through all possible subgroups for systematic deviations by comparing detected patterns using only protected attributes against patterns detected using the entire search space. One interesting pattern found in the OULAD dataset is that while having a high course load and not being from the highest socio-economic decile of UK regions makes students 2.3 times more likely to fail or withdraw from courses, being from Ireland or Wales mitigates this risk by 37%. This pattern may have been missed if we focused our analysis on the protected groups of gender and disability only. Python code for all methods, including the most recently proposed "AutoStrat" are available on open-sourced code repositories. Link » Adebayo Oshingbesan · Winslow Omondi · Girmaw Abebe Tadesse · Celia Cintas · Skyler D. Speakman 🔗 - A Stochastic Optimization Framework for Fair Risk Minimization (Poster) []  []  []   link »    Despite the success of large-scale empirical risk minimization (ERM) at achieving high accuracy across a variety of machine learning tasks, fair ERM is hindered by the incompatibility of fairness constraints with stochastic optimization. We consider the problem of fair classification with discrete sensitive attributes and potentially large models and data sets, requiring stochastic solvers. Existing in-processing fairness algorithms are either impractical in the large-scale setting because they require large batches of data at each iteration or they are not guaranteed to converge. In this paper, we develop the first stochastic in-processing fairness algorithm with guaranteed convergence. For demographic parity, equalized odds, and equal opportunity notions of fairness, we provide slight variations of our algorithm--called FERMI--and prove that each of these variations converges in stochastic optimization with any batch size. Empirically, we show that FERMI is amenable to stochastic solvers with multiple (non-binary) sensitive attributes and non-binary targets, performing well even with minibatch size as small as one. Extensive experiments show that FERMI achieves the most favorable tradeoffs between fairness violation and test accuracy across all tested setups compared with state-of-the-art baselines for demographic parity, equalized odds, equal opportunity. These benefits are especially significant with small batch sizes and for non-binary classification with large number of sensitive attributes, making FERMI a practical, scalable fairness algorithm. Link » Andrew Lowy · Sina Baharlouei · Rakesh Pavan · Meisam Razaviyayn · Ahmad Beirami 🔗 - On the Trade-Off between Actionable Explanations and the Right to be Forgotten (Poster) []  []  []   link » As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the "right to be forgotten" which gives users the right to have their data deleted. To date, it is unknown whether these two principles can be operationalized simultaneously. Therefore, we introduce and study the problem of recourse invalidation in the context of data deletion requests. More specifically, we theoretically and empirically analyze the behavior of popular state-of-the-art algorithms and demonstrate that the recourses generated by these algorithms are likely to be invalidated if a small number of data deletion requests (e.g., 1 or 2) warrant updates of the predictive model. For the setting of linear models and overparameterized neural networks -- studied through the lens of neural tangent kernels (NTKs) -- we suggest a framework to identify a minimal subset of critical training points which, when removed, maximize the fraction of invalidated recourses. Using our framework, we empirically show that the removal of as little as 2 data instances from the training set can invalidate up to 95 percent of all recourses output by popular state-of-the-art algorithms. Thus, our work raises fundamental questions about the compatibility of "the right to an actionable explanation" in the context of the "right to be forgotten" while also providing constructive insights on the determining factors of recourse robustness. Link » Martin Pawelczyk · Tobias Leemann · Asia Biega · Gjergji Kasneci 🔗 - But Are You Sure? Quantifying Uncertainty in Model Explanations (Poster) []  []   link » Even when a black-box model makes accurate predictions (e.g., whether it will rain tomorrow), it is difficult to extract principles from the model that improve human understanding (e.g., what set of atmospheric conditions best predict rainfall). Model explanations via explainability methods (e.g., LIME, Shapley values) can help by highlighting interpretable aspects of the model, such the data features to which the model is most sensitive. However, these methods can be unstable and inconsistent, which often ends up providing unreliable insights. Moreover, under the existence of many near-optimal models, there is no guarantee that explanations for a single model will agree with explanations from the true model that generated the data. In this work, instead of explaining a single best-fitting model, we develop principled methods to construct an uncertainty set for the true explanation'': the explanation from the (unknown) true model that generated the data. We show finite-sample guarantees that the uncertainty set we return includes the explanation for the true model with high probability. We show through synthetic experiments that our uncertainty sets have high fidelity to the explanations of the true model. We then report our findings on real-world data. Link » Charles Marx · Youngsuk Park · Hilaf Hasson · Yuyang (Bernie) Wang · Stefano Ermon · Chaitanya Baru 🔗 - Fairness-aware Missing Data Imputation (Poster) []  []   link »    Missing values are ubiquitous in real-world datasets and are known to cause unfairness in a machine learning algorithm's decision-making process. However, there has been limited work that aims to mitigate the unfairness associated with missing data imputation. In this paper, we first derive a positive information-theoretic lower bound for the imputation fairness when using ground-truth conditional distribution for missing data imputation. Furthermore, we propose a novel missing data imputation model, known as fairness-aware imputation GAN (FIGAN), which provides accurate imputations while achieving imputation fairness. Through experiments, we illustrate that FIGAN can significantly improve imputation fairness, compared to the existing imputation methods. At the same time, FIGAN can also achieve competitive imputation accuracy. Link » Yiliang Zhang · Qi Long 🔗 - Socially Responsible Reasoning with Large Language Models and The Impact of Proper Nouns (Poster) []  []   link »    Language models with billions of parameters have shown remarkable emergent properties, including the ability to reason on unstructured data. We show that open-science multi-lingual large language models can perform the task of spatial reasoning on two or more entities with significant accuracy. A socially responsible large language model would perform this spatial reasoning task with the same accuracy regardless of the choice of the names of the entities over which the spatial relationships are defined. However, we show that the accuracies of contemporary large language models are significantly impacted by choice of proper nouns even when the underlying task ought to be independent of the choice of proper nouns. In this context, we also observe that the conditional log probabilities or beam scores of language model predictions are not well-calibrated, and the scores do not discriminate between correct and wrong responses. Link » Sumit Jha · Rickard Ewetz · Alvaro Velasquez · Susmit Jha 🔗 - When Personalization Harms: Reconsidering the Use of Group Attributes of Prediction (Poster) []  []   link » Machine learning models often use group attributes to assign personalized predictions. In this work, we show that models that use group attributes can assign unnecessarily inaccurate predictions to specific groups -- i.e., that training a model with group attributes can reduce performance for specific groups. We propose formal conditions to ensure the fair use" of group attributes in prediction models -- i.e., collective preference guarantees that can be checked by training one additional model. We characterize how machine learning models can exhibit fair use due to standard practices in specification, training, and deployment. We study the prevalence of fair use violations in clinical prediction models. Our results highlight the inability to resolve fair use violations, underscore the need to measure the gains of personalization for all groups who provide personal data and illustrate actionable interventions to mitigate harm. Link » Vinith Suriyakumar · Marzyeh Ghassemi · Berk Ustun 🔗 - An Analysis of Social Biases Present in BERT Variants Across Multiple Languages (Poster) []  []  []   link »    Although large pre-trained language models have achieved great success in many NLP tasks, it has been shown that they reflect human biases from their pre-training corpora. This bias may lead to undesirable outcomes when these models are applied in real-world settings. In this paper, we investigate the bias present in monolingual BERT models across a diverse set of languages (English, Greek, and Persian). While recent research has mostly focused on gender-related biases, we analyze religious and ethnic biases as well and propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood, that can handle morphologically complex languages with gender-based adjective declensions. We analyze each monolingual model via this method and visualize cultural similarities and differences across different dimensions of bias. Ultimately, we conclude that current methods of probing for bias are highly language-dependent, necessitating cultural insights regarding the unique ways bias is expressed in each language and culture (e.g. through coded language, synecdoche, and other similar linguistic concepts). We also hypothesize that higher measured social biases in the non-English BERT models correlate with user-generated content in their training. Link » Parishad BehnamGhader · Aristides Milios 🔗 - Denoised Smoothing with Sample Rejection for Robustifying Pretrained Classifiers (Poster) []  []   link »    Denoised smoothing is the state-of-the-art approach to defending pretrained classifiers against $\ell_p$ adversarial attacks, where a denoiser is prepended to the pretrained classifier, and the joint system is adversarially verified via randomized smoothing. Despite its state-of-the-art certified robustness against $\ell_2$-norm adversarial inputs, the pretrained base classifier is often quite uncertain when making its predictions on the denoised examples, which leads to lower natural accuracy. In this work, we show that by augmenting the joint system with a rejector'' and exploiting adaptive sample rejection, (i.e., intentionally abstain from providing a prediction), we can achieve substantially improved accuracy (especially natural accuracy) over denoised smoothing alone. That is, we show how the joint classifier-rejector can be viewed as a classification-with-rejection per sample, while the smoothed joint system can be turned into a robust \emph{smoothed classifier without rejection}, against $\ell_2$-norm perturbations while retaining certifiability. Tests on CIFAR10 dataset show considerable improvements in \emph{natural} accuracy without degrading adversarial performance, with affordably-trainable rejectors, specially for medium and large values of noise parameter $\sigma$. Link » Fatemeh Sheikholeslami · Wan-Yi Lin · Jan Hendrik Metzen · Huan Zhang · J. Zico Kolter 🔗 - On the Robustness of deep learning-based MRI Reconstruction to image transformations (Poster) []  []  []   link » Although deep learning (DL) has received much attention in accelerated magnetic resonance imaging (MRI), recent studies show that tiny input perturbations may lead to instabilities of DL-based MRI reconstruction models. However, the approaches of robustifying these models are underdeveloped. Compared to image classification, it could be much more challenging to achieve a robust MRI image reconstruction network considering its regression-based learning objective, limited amount of training data, and lack of efficient robustness metrics. To circumvent the above limitations, our work revisits the problem of DL-based image reconstruction through the lens of robust machine learning. We find a new instability source of MRI image reconstruction, i.e., the lack of reconstruction robustness against spatial transformations of an input, e.g., rotation and cutout. Inspired by this new robustness metric, we develop a robustness-aware image reconstruction method that can defend against both pixel-wise adversarial perturbations as well as spatial transformations. Extensive experiments are also conducted to demonstrate the effectiveness of our proposed approaches. Link » jinghan jia · Mingyi Hong · Yimeng Zhang · Mehmet Akcakaya · Sijia Liu 🔗 - Real world relevance of generative counterfactual explanations (Poster) []  []   link » The interpretability of deep learning based algorithms is critical in settings where the algorithm must provide actionable information such as clinical diagnoses or instructions in autonomous driving. Image based explanations or feature attributions are an often-proposed solution for natural imaging datasets, but their utility for mission critical settings is unclear. In this work, we provide image explanations that are both semantically interpretable and assess their utility for real world relevance using imaging data extracted from clinical settings. We address the problem of pneumonia classification from Chest X-ray images where we show that (1) by perturbing specific latent dimensions of a GAN based model, the classifier predictions can be flipped and (2) the latent factors have clinical relevance. We demonstrate the latter by performing a case study with a board-certified radiologist and identify some latent factors that are clinically informative and others that may capture spurious correlations. Link » Swami Sankaranarayanan · Thomas Hartvigsen · Lauren Oakden-Rayner · Marzyeh Ghassemi · Phillip Isola 🔗 - Quantifying Social Biases Using Templates is Unreliable (Poster) []  []   link »    Recently, there has been an increase in efforts to understand how large language models (LLMs) propagate and amplify social biases. Several works have utilized templates for fairness evaluation, which allow researchers to quantify social biases in the absence of test sets with protected attribute labels. While template evaluation can be a convenient and helpful diagnostic tool to understand model deficiencies, it often uses a simplistic and limited set of templates. In this paper, we study whether bias measurements are sensitive to the choice of templates used for benchmarking. Specifically, we investigate the instability of bias measurements by manually modifying templates proposed in previous works in a semantically-preserving manner and measuring bias across these modifications. We find that bias values and resulting conclusions vary considerably across template modifications on four tasks, ranging from an 81% reduction (NLI) to a 162% increase (MLM) in (task-specific) bias measurements. Our results indicate that quantifying fairness in LLMs, as done in current practice, can be brittle and needs to be approached with more care and caution. Link » Preethi Seshadri · Pouya Pezeshkpour · Sameer Singh 🔗 - DensePure: Understanding Diffusion Models towards Adversarial Robustness (Poster) []  []   link » Diffusion models have been recently employed to improve certified robustness through the process of denoising. However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions under which they can enhance certified robustness. This deeper understanding allows us to propose a new method DensePure, designed to improve the certified robustness of a pretrained model (i.e. classifier). Given an (adversarial) input, DensePure consists of multiple runs of denoising via the reverse process of the diffusion model (with different random seeds) to get multiple reversed samples, which are then passed through the classifier, followed by majority voting of inferred labels to make the final prediction. This design of using multiple runs of denoising is informed by our theoretical analysis of the conditional distribution of the reversed sample. Specifically, when the data density of a clean sample is high, its conditional density under the reverse process in a diffusion model is also high; thus sampling from the latter conditional distribution can purify the adversarial example and return the corresponding clean sample with a high probability. By using the highest density point in the conditional distribution as the reversed sample, we identify the robust region of a given instance under the diffusion model's reverse process. We show that this robust region is a union of multiple convex sets, and is potentially much larger than the robust regions identified in previous works. In practice, DensePure can approximate the label of the high density region in the conditional distribution so that it can enhance certified robustness. We conduct extensive experiments to demonstrate the effectiveness of DensePure by evaluating its certified robustness given a standard model via randomized smoothing. We show that DensePure is consistently better than existing methods on ImageNet, with 7% improvement on average. Link » Chaowei Xiao · Zhongzhu Chen · Kun Jin · Jiongxiao Wang · Weili Nie · Mingyan Liu · Anima Anandkumar · Bo Li · Dawn Song 🔗 - A Fair Loss Function for Network Pruning (Poster) []  []  []   link »    Model pruning can enable the deployment of neural networks in environments with resource constraints. While pruning may have a small effect on the overall performance of the model, it can exacerbate existing biases into the model such that subsets of samples see significantly degraded performance. In this paper, we introduce the performance weighted loss function, a simple modified cross-entropy loss function that can be used to limit the introduction of biases during pruning. Experiments using biased classifiers for facial classification and skin-lesion classification tasks demonstrate that the proposed method is a simple and effective tool that can enable existing pruning methods to be used in fairness sensitive contexts. Link » Robbie Meyer · Alexander Wong 🔗 - Benchmarking the Effect of Poisoning Defenses on the Security and Bias of the Final Model (Poster) []  []   link »    Machine learning models are susceptible to a class of attacks known as adversarial poisoning where an adversary can maliciously manipulate training data to hinder model performance or, more concerningly, insert backdoors to exploit at inference time. Many methods have been proposed to defend against adversarial poisoning by either identifying the poisoned samples to facilitate removal or developing poison agnostic training algorithms. Although effective, these proposed approaches can have unintended consequences on other aspects of model performance, such as worsening performance on certain data sub-populations, thus inducing a classification bias. In this work, we evaluate several adversarial poisoning defenses. In addition to traditional security metrics, i.e., robustness to poisoned samples, we propose a new metric to measure the potential undesirable discrimination of sub-populations resulting from using these defenses. Our investigation highlights that many of the evaluated defenses trade decision fairness to achieve higher adversarial poisoning robustness. Given these results, we recommend our proposed metric to be part of standard evaluations of machine learning defenses. Link » Nathalie Baracaldo · Kevin Eykholt · Farhan Ahmed · Yi Zhou · Shriti Priya · Taesung Lee · Swanand Kadhe · Yusong Tan · Sridevi Polavaram · Sterling Suggs 🔗 - Private Data Leakage via Exploiting Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems (Poster) []  []  []   link »    Deep Learning-based Recommendation models use sparse and dense features of a user to predict an item that the user may like. These features carry the users' private information, service providers often protect these values by memory encryption (e.g., with hardware such as Intel's SGX). However, even with such protection, an attacker may still learn information about which entry of the sparse feature is nonzere through the embedding table access pattern. In this work, we show that only leaking the sparse features' nonzero entry positions can be a big threat to privacy. Using the embedding table access pattern, we show that it is possible to identify or re-identify a user, or extract sensitive attributes from a user. We subsequently show that applying a hash function to anonymize the access pattern cannot be a solution, as it can be reverse-engineered in many cases. Link » Hanieh Hashemi · Wenjie Xiong · Liu Ke · Kiwan Maeng · Murali Annavaram · G. Edward Suh · Hsien-Hsin Lee 🔗 - A Brief Overview of AI Governance for Responsible Machine Learning Systems (Poster) []  []  []   link »    Organizations of all sizes, across all industries and domains are leveraging artificial intelligence (AI) technologies to solve some of their biggest challenges around operations, customer experience, and much more. However, due to the probabilistic nature of AI, the risks associated with it are far greater than traditional technologies. Research has shown that these risks can range anywhere from regulatory / compliance, reputational, user trust and societal risks, to financial and even existential risks. Depending on the nature and size of the organization, AI technologies can pose a significant risk, if not used in a responsible way. This text seeks to present a brief introduction to AI governance, which is a framework designed to oversee the responsible use of AI with the goal of preventing and mitigating risks. Having such a framework will not only manage risks but also gain maximum value out of AI projects and develop consistency for organization-wide adoption of AI. Link » Navdeep Gill · Marcos Conde 🔗 - Forgetting Data from Pre-trained GANs (Poster) []  []  []   link »    Large pre-trained generative models are known to occasionally output undesirable samples, which contributes to their trustworthiness. The common way to mitigate this is to re-train them differently from scratch using different data or different regularization – which uses a lot of computational resources and does not always fully address the problem. In this work, we take a different, more compute-friendly approach and investigate how to post-edit a model after training so that it “forgets”, or refrains from outputting certain kinds of samples. We show that forgetting is different from data deletion, and data deletion may not always lead to forgetting. We then consider Generative Adversarial Networks (GANs), and provide three different algorithms for data forgetting that differ on how the samples to be forgotten are described. Extensive evaluations on real-world image datasets show that our algorithms out-perform data deletion baselines, and are capable of forgetting data while retaining high generation quality at a fraction of the cost of full re-training. Link » Zhifeng Kong · Kamalika Chaudhuri 🔗 - Training Differentially Private Graph Neural Networks with Random Walk Sampling (Poster) []  []  []   link »    Deep learning models are known to put the privacy of their training data at risk, which poses challenges for their safe and ethical release to the public. Differentially private stochastic gradient descent is the de facto standard for training neural networks without leaking sensitive information about the training data. However, applying it to models for graph-structured data poses a novel challenge: unlike with i.i.d.\ data, sensitive information about a node in a graph cannot only leak through its gradients, but also through the gradients of all nodes within a larger neighborhood. In practice, this limits privacy-preserving deep learning on graphs to very shallow graph neural networks. We propose to solve this issue by training graph neural networks on disjoint subgraphs of a given training graph. We develop three random-walk-based methods for generating such disjoint subgraphs and perform a careful analysis of the data-generating distributions to provide strong privacy guarantees. Through extensive experiments, we show that our method greatly outperforms the state-of-the-art baseline on three large graphs, and matches or outperforms it on four smaller ones. Link » Morgane Ayle · Jan Schuchardt · Lukas Gosch · Daniel Zügner · Stephan Günnemann 🔗 - Physically-Constrained Adversarial Attacks on Brain-Machine Interfaces (Poster) []  []  []   link »    Deep learning (DL) has been widely employed in brain--machine interfaces (BMIs) to decode subjects' intentions based on recorded brain activities enabling direct interaction with machines. BMI systems play a crucial role in medical applications and have recently gained an increasing interest as consumer-grade products. Failures in such systems might cause medical misdiagnoses, physical harm, and financial loss. Especially with the current market boost of such devices, it is of utmost importance to analyze and understand in-depth, potential malicious attacks to develop countermeasures and avoid future damages. This work presents the first study that analyzes and models adversarial attacks based on physical domain constraints in DL-based BMIs. Specifically, we assess the robustness of EEGNet which is the current state-of-the-art network embedded in a real-world, wearable BMI. We propose new methods that incorporate domain-specific insights and constraints to design natural and imperceptible attacks and to realistically model signal propagation over the human scalp. Our results show that EEGNet is significantly vulnerable to adversarial attacks with an attack success rate of more than 50%. Link » Xiaying Wang · Rodolfo Octavio Siller Quintanilla · Michael Hersche · Luca Benini · Gagandeep Singh 🔗 - Striving for data-model efficiency: Identifying data externalities on group performance (Poster) []  []   link »    Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. In this work, we seek to better understand how we might characterize, detect, and design for data-model synergies. We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population, a phenomenon we refer to as negative data externalities on group performance. Such externalities can arise in standard learning settings and can manifest differently depending on conditions between training set size and model size. Data externalities directly imply a lower bound on feasible model improvements, yet improving models efficiently requires understanding the underlying data-model tensions. From a broader perspective, our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning Link » Esther Rolf · Ben Packer · Alex Beutel · Fernando Diaz 🔗 - A Closer Look at the Intervention Procedure of Concept Bottleneck Models (Poster) []  []  []   link »    Concept bottleneck models (CBMs) are a class of interpretable neural network models that predict the target label of a given input based on its high-level concepts. Unlike other end-to-end deep learning models, CBMs enable domain experts to intervene on the predicted concepts at test time so that more accurate and reliable target predictions can be made. While the intervenability provides a powerful avenue of control, many aspects of the intervention procedure remain underexplored. In this work, we inspect the current intervention practice for its efficiency and reliability. Specifically, we first present an array of new intervention methods to significantly improve the target prediction accuracy for a given budget of intervention expense. We also bring attention to non-trivial yet unknown issues related to reliability and fairness of the intervention and discuss how we can fix these problems in practice. Link » Sungbin Shin · Yohan Jo · Sungsoo Ahn · Namhoon Lee 🔗 - Revisiting Robustness in Graph Machine Learning (Poster) []  []  []   link »    Many works show that node-level predictions of Graph Neural Networks (GNNs) are unrobust to small, often termed adversarial, changes to the graph structure. However, because manual inspection of a graph is difficult, it is unclear if the studied perturbations always preserve a core assumption of adversarial examples: that of unchanged semantic content. To address this problem, we introduce a more principled notion of an adversarial graph, which is aware of semantic content change. Using Contextual Stochastic Block Models (CSBMs) and real-world graphs, our results uncover: $i)$ for a majority of nodes the prevalent perturbation models include a large fraction of perturbed graphs violating the unchanged semantics assumption; $ii)$ surprisingly, all assessed GNNs show over-robustness - that is robustness beyond the point of semantic change. We find this to be a complementary phenomenon to adversarial robustness related to the small degree of nodes and their class membership dependence on the neighbourhood structure. Link » Lukas Gosch · Daniel Sturm · Simon Geisler · Stephan Günnemann 🔗 - Cooperation or Competition: Avoiding Player Domination for Multi-target Robustness by Adaptive Budgets (Poster) []  []  []   link »    Despite incredible advances, deep learning has been shown to be susceptible to adversarial attacks. Numerous approaches were proposed to train robust networks both empirically and certifiably. However, most of them defend against only a single type of attack, while recent work steps forward at defending against multiple attacks. In this paper, to understand multi-target robustness, we view this problem as a bargaining game in which different players (adversaries) negotiate to reach an agreement on a joint direction of parameter updating. We identify a phenomenon named \emph{player domination} in the bargaining game, and show that with this phenomenon, some of the existing max-based approaches such as MAX and MSD do not converge. Based on our theoretical results, we design a novel framework that adjusts the budgets of different adversaries to avoid player domination. Experiments on two benchmarks show that employing the proposed framework to the existing approaches significantly advances multi-target robustness. Link » Yimu Wang · Dinghuai Zhang · Yihan Wu · Heng Huang · Hongyang Zhang 🔗 - GFairHint: Improving Individual Fairness for Graph Neural Networks via Fairness Hint (Poster) []  []  []   link »    Graph Neural Networks (GNNs) have proven their versatility over diverse scenarios. With increasing considerations of societal fairness, many studies focus on algorithmic fairness in GNNs. Most of them aim to improve fairness at the group level, while only a few works focus on individual fairness, which attempts to give similar predictions to similar individuals for a specific task. We expect that such an individual fairness promotion framework should be compatible with both discrete and continuous task-specific similarity measures for individual fairness and balanced between utility (e.g., classification accuracy) and fairness. Fairness promotion frameworks are generally desired to be computationally efficient and compatible with various GNN model designs. With previous work failing to achieve all these goals, we propose a novel method $\textbf{GFairHint}$ for promoting individual fairness in GNNs, which learns a "fairness hint" through an auxiliary link prediction task. We empirically evaluate our methods on five real-world graph datasets that cover both discrete and continuous settings for individual fairness similarity measures, with three popular backbone GNN models. The proposed method achieves the best fairness results in almost all combinations of datasets with various backbone models, while generating comparable utility results, with much less computation cost compared to the previous state-of-the-art model (SoTA). Link » Paiheng Xu · Yuhang Zhou · Bang An · Wei Ai · Furong Huang 🔗 - Uncertainty-aware predictive modeling for fair data-driven decisions (Poster) []  []  []   link »    Both industry and academia have made considerable progress in developing trustworthy and responsible machine learning (ML) systems. While critical concepts like fairness and explainability are often addressed, the safety of systems is typically not sufficiently taken into account. By viewing data-driven decision systems as socio-technical systems, we draw on the uncertainty in ML literature to show how fairML systems can also be safeML systems. We posit that a fair model needs to be an uncertainty-aware model, e.g. by drawing on distributional regression. For fair decisions, we argue that a safe fail option should be used for individuals with uncertain categorization. We introduce semi-structured deep distributional regression as a modeling framework which addresses multiple concerns brought against standard ML models and show its use in a real-world example of algorithmic profiling of job seekers. Link » Patrick Kaiser · Christoph Kern · David Rügamer 🔗 - Information-Theoretic Evaluation of Free-Text Rationales with Conditional $\mathcal{V}$-Information (Poster) []  []   link »    Free-text rationales are a promising step towards explainable AI, yet their evaluation remains an open research problem. While existing metrics have mostly focused on measuring the direct association between the rationale and a given label, we argue that an ideal metric should also be able to focus on the new information uniquely provided in the rationale that is otherwise not provided in the input or the label. We investigate this research problem from an information-theoretic perspective using the conditional $\mathcal{V}$-information \citep{hewitt-etal-2021-conditional}. More concretely, we propose a metric called REV (Rationale Evaluation with conditional $\mathcal{V}$-information), that can quantify the new information in a rationale supporting a given label beyond the information already available in the input or the label. Experiments on reasoning tasks across four benchmarks, including few-shot prompting with GPT-3, demonstrate the effectiveness of REV in evaluating different types of rationale-label pairs, compared to existing metrics. Through several quantitative comparisons, we demonstrate the capability of REV in providing more sensitive measurements of new information in free-text rationales with respect to a label. Furthermore, REV is consistent with human judgments on rationale evaluations. Overall, when used alongside traditional performance metrics, REV provides deeper insights into a models' reasoning and prediction processes. Link » Hanjie Chen · Faeze Brahman · Xiang Ren · Yangfeng Ji · Yejin Choi · Swabha Swayamdipta 🔗 - Few-shot Backdoor Attacks via Neural Tangent Kernels (Poster) []  []   link » In a backdoor attack, an attacker injects corrupted examples into the training set. The goal of the attacker is to cause the final trained model to predict the attacker's desired target label when a predefined trigger is added to test inputs. Central to these attacks is the trade-off between the success rate of the attack and the number of corrupted training examples injected. We pose this attack as a novel bilevel optimization problem: construct strong poison examples that maximize the attack success rate of the trained model. We use neural tangent kernels to approximate the training dynamics of the model being attacked and automatically learn strong poison examples. We experiment on subclasses of CIFAR-10 and ImageNet with WideResNet-34 and ConvNeXt architectures on periodic and patch trigger attacks and show that NTBA-designed poisoned examples achieve, for example, an attack success rate of 90% with ten times smaller number of poison examples injected compared to the baseline. We provided an interpretation of the NTBA-designed attacks using the analysis of kernel linear regression. We further demonstrate a vulnerability in overparametrized deep neural networks, which is revealed by the shape of the neural tangent kernel. Link » Jonathan Hayase · Sewoong Oh 🔗 - Controllable Attack and Improved Adversarial Training in Multi-Agent Reinforcement Learning (Poster) []  []   link »    Deep reinforcement learning policies have been shown vulnerable to adversarial attacks due to the inherit frangibility of neural networks. Current attack methods mainly focus on the adversarial state or action perturbations, where such direct manipulations to a reinforcement learning system may not always be feasible or realizable in the real-world. In this paper, we consider the more practical adversarial attacks realized through actions by an adversarial agent in the same environment.It has been shown, in prior work, that an victim agent is vulnerable to behaviors of an adversarial agent who targets to attack the victim, at the cost of introducing perceivable abnormal behaviors for the adversarial agent itself. To address this, we propose to constrain the state distribution shift caused by the adversarial policy and offer a more controllable attack scheme by building connections among policy space variations, state distribution shift, and the value function difference. To provide provable defense, we revisit the cycling behavior of common adversarial training methods in Markov game, which has been a well-known issue in general differential games including Generative Adversarial Networks (GANs) and adversarial training in supervised learning. We propose to fix the non-converging behavior through a simple timescale separation mechanism. In sharp contrast to general differential games, where timescale separation may only converge to stationary points, a two-timescale training methods in Markov games can converge to the Nash Equilibrium (NE). Using the Robosumo competition experiments, we demonstrate the controllable attack is much more efficient in the sense that it can introduce much less state distribution shift while achieving the same winning rate with unconstrained attack. Furthermore, in both Kuhn Poker and Robosumo competition, we verify that the rule of timescale separation leads to stable learning dynamics and less exploitable victim policies. Link » Xiangyu Liu · Souradip Chakraborty · Furong Huang 🔗 - Interactive Rationale Extraction for Text Classification (Poster) []  []  []   link »    Deep neural networks show superior performance in text classification tasks, but their poor interpretability and explainability can cause trust issues. For text classification problems, the identification of textual sub-phrases or rationales'' is one strategy for attempting to find the most influential portions of text, which can be conveyed as critical in making classification decisions. Selective models for rationale extraction faithfully explain a neural classifier's predictions by training a rationale generator and a text classifier jointly: the generator identifies rationales and the classifier predicts a category solely based on the rationales. The selected rationales are then viewed as the explanations for the classifier's predictions. Through exchange of such explanations, humans interact to achieve higher performances in problem solving. To imitate the interactive process of humans, we propose a simple interactive rationale extraction architecture that selects a pair of rationales and then makes predictions from two independently trained selective models. We show how this architecture outperforms both base models for text classification tasks on datasets IMDB movie reviews and 20 Newsgroups in terms of predictive performance. Link » Jiayi Dai · Mi-Young Kim · Randolph Goebel 🔗 - Just Avoid Robust Inaccuracy: Boosting Robustness Without Sacrificing Accuracy (Poster) []  []  []   link »    While current methods for training robust deep learning models optimize robust accuracy, they significantly reduce natural accuracy, hindering their adoption in practice.Further, the resulting models are often both robust and inaccurate on numerous samples, providing a false sense of safety for those.In this work, we extend prior works in three main directions. First, we explicitly train the models to jointly maximize robust accuracy and minimize robust inaccuracy. Second, since the resulting models are trained to be robust only if they are accurate, we leverage robustness as a principled abstain mechanism. Finally, this abstain mechanism allows us to combine models in a compositional architecture that significantly boosts overall robustness without sacrificing accuracy. We demonstrate the effectiveness of our approach for empirical robustness on four recent state-of-the-art models and four datasets. For example, on CIFAR-10 with $\epsilon_\infty = 1/255$, we successfully enhanced the robust accuracy of a pre-trained model from 26.2% to 87.8% while even slightly increasing its natural accuracy from 97.8% to 98.0%. Link » Yannick Merkli · Pavol Bielik · Petar Tsankov · Martin Vechev 🔗 - Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model (Poster) []  []  []   link »    Explainable question answering systems should produce not only accurate answers but also rationales that justify their reasoning and allow humans to check their work. But what sorts of rationales are useful and how can we train systems to produce them? We propose a new style of rationale for open-book question answering, called \emph{markup-and-mask}, which combines aspects of extractive and free-text explanations. In the markup phase, the passage is augmented with free-text markup that enables each sentence to stand on its own outside the discourse context. In the masking phase, a sub-span of the marked-up passage is selected. To train a system to produce markup-and-mask rationales without annotations, we leverage in-context learning. Specifically, we generate silver annotated data by sending a series of prompts to a frozen pretrained language model, which acts as a teacher. We then fine-tune a smaller student model by training on the subset of rationales that led to correct answers. The student is "honest" in the sense that it is a pipeline: the rationale acts as a bottleneck between the passage and the answer, while the "untrusted" teacher operates under no such constraints. Thus, we offer a new way to build trustworthy pipeline systems from a combination of end-task annotations and frozen pretrained language models. Link » Jacob Eisenstein · Daniel Andor · Bernd Bohnet · Michael Collins · David Mimno 🔗 - FL-Talk: Covert Communication in Federated Learning via Spectral Steganography (Poster) []  []  []   link »    Federated Learning (FL) allows edge users to collaboratively train a global model without sharing their private data. We propose FL-Talk, the first spectral steganography-based covert communication framework in FL that enables stealthy information sharing between local clients while preserving FL convergence. We demonstrate that the sender can encode the secret message strategically in the spectrum of his local model parameters such that after model aggregation, the receiver can extract the message correctly from the ‘encoded’ global model. Furthermore, we design a robust spectral message detection scheme for the receiver. Extensive evaluation results show that FL-Talk can establish a stealthy and reliable covert communication channel between clients without interfering with FL training. Link » Huili Chen · Farinaz Koushanfar 🔗 - Accelerating Open Science for AI in Heliophysics (Poster) []  []   link »    Rarely are Artificial Intelligence (AI) projects packaged in a way where scientists and non-AI specialists can easily pick up advanced Machine Learning (ML) workflows. Similarly, AI engineers are not always able to contribute meaningfully to a science domain without being provided with useful application context or analysis-ready data. Because of this–and other factors–applied AI research often stalls at the research paper stage, where the often complex logistics of replicating and building on the work of others impedes substantive progress. A state of affairs has been identified by the community as ‘Reproducibility.’ (1,500 scientists lift the lid on reproducibility - Nature). Potential gains in AI are therefore hampered by the “expertise gap” between ML specialists and domain scientists.Moreover, the reputation of AI as a transformative tool for science is somewhat belated due to the lack of deployed, trusted solutions in the wild–as projects struggle to migrate from mid-TRL (Technical Readiness Level) to high TRL.Another key concept is that AI projects are never really finished. Improvements can be made in both the model choice (the selection of which improves annually) and training data–the latter often being the key actor in improving outcomes. Once built, workflows can easily grow to accommodate more data over time. In this paper we present the learnings for a study conducted to tackle findings informed by the 2021 SMD AI Workshop, showcasing best practice in the adoption of trusted and maintained open science in AI for Heliophysics and scaling lower TRL applications to higher TRLs. We also present an example of rapid derivative Heliophysics research conducted by a non-subject matter expert, showing the value of these kinds of open science approaches. Link » Dolores Garcia · Paul Wright · Mark Cheung · Meng Jin · James Parr 🔗 - A Theory of Learning with Competing Objectives and User Feedback (Poster) []  []  []   link »    Large-scale deployed learning systems are often evaluated alongmultiple objectives or criteria. But, how can we learn or optimizesuch complex systems, with potentially conflicting or evenincompatible objectives? How can we improve the system when user feedback becomes available, feedback possibly alerting to issues not previously optimized for by the system?We present a new theoretical model for learning and optimizing suchcomplex systems. Rather than committing to a static or pre-definedtradeoff for the multiple objectives, our model is guided by thefeedback received, which is used to update its internal state.Our model supports multiple objectives that can be of very generalform and takes into account their potential incompatibilities.We consider both a stochastic and an adversarial setting. In thestochastic setting, we show that our framework can be naturally castas a Markov Decision Process with stochastic losses, for which we giveefficient vanishing regret algorithmic solutions. In the adversarialsetting, we design efficient algorithms with competitive ratioguarantees.We also report the results of experiments with our stochasticalgorithms validating their effectiveness. Link » Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri 🔗 - zPROBE: Zero Peek Robustness Checks for Federated Learning (Poster) []  []  []   link »    Privacy-preserving federated learning allows multiple users to jointly train a model with coordination of a central server. The server only learns the final aggregation result, thereby preventing leakage of the users' (private) training data from the individual model updates. However, keeping the individual updates private allows malicious users to perform Byzantine attacks and degrade the model accuracy without being detected. Best existing defenses against Byzantine workers rely on robust rank-based statistics, e.g., the median, to find malicious updates. However, implementing privacy-preserving rank-based statistics is nontrivial and unscalable in the secure domain, as it requires sorting of all individual updates. We establish the first private robustness check that uses high break point rank-based statistics on aggregated model updates. By exploiting randomized clustering, we significantly improve the scalability of our defense without compromising privacy. We leverage the derived statistical bounds in zero-knowledge proofs to detect and remove malicious updates without revealing the private user updates. Our novel framework, zPROBE, enables Byzantine resilient and secure federated learning. Empirical evaluations demonstrate that zPROBE provides a low overhead solution to defend against state-of-the-art Byzantine attacks while preserving privacy. Link » Zahra Ghodsi · Mojan Javaheripi · Nojan Sheybani · Xinqiao Zhang · Ke Huang · Farinaz Koushanfar 🔗 - PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales (Poster) []  []  []   link » Neural language models (LMs) have achieved impressive results on various language-based reasoning tasks by utilizing latent knowledge encoded in their own pretrained parameters. To make this reasoning process more explicit, recent works retrieve a rationalizing LM's internal knowledge by training/prompting it to generate free-text rationales, which can be used to guide task predictions made by either the same LM or a separate reasoning LM. However, rationalizing LMs require expensive rationale annotation, without any assurance that the generated rationales improve LM task performance or faithfully reflect LM decision-making.In this paper, we propose PINTO, an LM pipeline that rationalizes via prompt-based learning, and learns to faithfully reason over rationales via counterfactual regularization. First, PINTO maps out a suitable reasoning process for the task input by prompting a frozen rationalizing LM to generate a free-text rationale. Second, PINTO's reasoning LM is fine-tuned to solve the task using the generated rationale as context, while regularized to output less confident predictions when the rationale is perturbed. Across four datasets, we show that PINTO significantly improves the generalization ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets. Also, PINTO leverages the rationales more faithfully than competitive baselines do. Link » Peifeng Wang · Aaron Chan · Filip Ilievski · Muhao Chen · Xiang Ren 🔗 - Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks (Poster) []  []  []   link » We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced. An adversary first adds a few carefully crafted points to the training dataset such that the impact on the model's predictions is minimal. The adversary subsequently triggers a request to remove a subset of the introduced points at which point the attack is unleashed and the model's predictions are negatively affected. In particular, we consider clean-label targeted attacks (in which the goal is to cause the model to misclassify a specific test point) on datasets including CIFAR-10, Imagenette, and Imagewoof. This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset. Link » Jimmy Di · Jack Douglas · Jayadev Acharya · Gautam Kamath · Ayush Sekhari 🔗 - Individual Privacy Accounting with Gaussian Differential Privacy (Poster) []  []  []   link »    Individual privacy accounting enables bounding differential privacy (DP) loss individually for each participant involved in the analysis. This can be informative as often the individual privacy losses are considerably smaller than those indicated by the DP bounds that are based on considering worst-case bounds at each data access. In order to account for the individual privacy losses in a principled manner, we need a privacy accountant for adaptive compositions of randomised mechanisms, where the loss incurred at a given data access is allowed to be smaller than the worst-case loss. This kind of analysis has been carried out for the R\'enyi differential privacy (RDP) by Feldman and Zrnic (2021), however not yet for the so-called optimal privacy accountants. We make first steps in this direction by providing a careful analysis using the Gaussian differential privacy which gives optimal bounds for the Gaussian mechanism, one of the most versatile DP mechanisms. This approach is based on determining a certain supermartingale for the hockey-stick divergence and on extending the R\'enyi divergence-based fully adaptive composition results by Feldman and Zrnic (2021). We also consider measuring the individual $(\varepsilon,\delta)$-privacy losses using the so-called privacy loss distributions. With the help of the Blackwell theorem, we can then make use of the RDP analysis to construct an approximative individual $(\varepsilon,\delta)$-accountant. As an observation of indepedent interest, we experimentally illustrate that individual filtering leads to a disparate loss of accuracies among subgroups when training a neural network using DP gradient descent. Link » Antti Koskela · Marlon Tobaben · Antti Honkela 🔗 - Distributed Differential Privacy in Multi-Armed Bandits (Poster) []  []   link »    We consider the standard $K$-armed bandit problem under a distributed trust model of differential privacy (DP), which enables to guarantee privacy without a trustworthy server. Under this trust model, previous work largely focus on achieving privacy using a shuffle protocol, where a batch of users data are randomly permuted before sending to a central server. This protocol achieves ($\epsilon,\delta$) or approximate-DP guarantee by sacrificing an additive $O\!\left(\!\frac{K\log T\sqrt{\log(1/\delta)}}{\epsilon}\!\right)\!$ factor in $T$-step cumulative regret. In contrast, the optimal privacy cost to achieve a stronger ($\epsilon,0$) or pure-DP guarantee under the widely used central trust model is only $\Theta\!\left(\!\frac{K\log T}{\epsilon}\!\right)\!$, where, however, a trusted server is required. In this work, we aim to obtain a pure-DP guarantee under distributed trust model while sacrificing no more regret than that under central trust model. We achieve this by designing a generic bandit algorithm based on successive arm elimination, where privacy is guaranteed by corrupting rewards with an equivalent discrete Laplace noise ensured by a secure computation protocol. We numerically simulate regret performance of our algorithm, which corroborate our theoretical findings. Link » Sayak Ray Chowdhury · Xingyu Zhou 🔗 - Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal (Poster) []  []  []   link »    Explainable artificial intelligence (XAI) provides explanations for not interpretable machine learning (ML) models. While many technical approaches exist, there is a lack of validation of these techniques on real-world datasets. In this work, we present a use-case of XAI: an ML model which is trained to estimate electrification rates based on mobile phone data in Senegal. The data originate from the Data for Development challenge by Orange in 2014/15. We apply two model-agnostic, local explanation techniques and find that while the model can be verified, it is biased with respect to the population density. We conclude our paper by pointing to the two main challenges we encountered during our work: data processing and model design that might be restricted by currently available XAI methods, and the importance of domain knowledge to interpret explanations. Link » Laura State · Hadrien Salat · Stefania Rubrichi · Zbigniew Smoreda 🔗 - Learning to Take a Break: Sustainable Optimization of Long-Term User Engagement (Poster) []  []  []   link »    Optimizing user engagement is a key goal for modern recommendation systems, but blindly pushing users towards increased consumption risks burn-out, churn, or even addictive habits. To promote digital well-being, most platforms now offer a service that periodically prompts users to take a break. These, however, must be set up manually, and so may be suboptimal for both users and the system.In this paper, we propose a framework for optimizing long-term engagement by learning individualized breaking policies. Using Lotka-Volterra dynamics, we model users as acting based on two balancing latent states: drive, and interest---which must be conserved. We then give an efficient learning algorithm, provide theoretical guarantees, and empirically evaluate its performance on semi-synthetic data. Link » Eden Saig · Nir Rosenfeld 🔗 - Addressing Bias in Face Detectors using Decentralised Data collection with incentives (Poster) []  []  []   link »    Recent developments in machine learning have shown that successful models do not rely only on huge amounts of data but the right kind of data. We show in this paper how this data-centric approach can be facilitated in a decentralised manner to enable efficient data collection for algorithms. Face detectors are a class of models that suffer heavily from bias issues as they have to work on a large variety of different data. We also propose a face detection and anonymisation approach using a hybrid Multi-Task Cascaded CNN with FaceNet Embeddings to benchmark multiple datasets to describe and evaluate the bias in the models towards different ethnicities, gender and age groups along with ways to enrich fairness in a decentralised system of data labelling, correction and verification by users to create a robust pipeline for model retraining. Link » Ahan M R · Robin Lehmann · Richard Blythman 🔗 - Strategy-Aware Contextual Bandits (Poster) []  []   link » Algorithmic tools are often used to make decisions about people in high-stakes domains. In the presence of such automated decision making, there is incentive for strategic agents to modify their input to the algorithm in order to receive a more desirable outcome. While previous work on strategic classification attempts to capture this phenomenon, these models fail to take into account the multiple actions a decision maker usually has at their disposal, and the fact that they often have access only to bandit feedback. Indeed, in standard strategic classification, the decision maker's action is to either assign a positive or a negative prediction to the agent, and they are assumed to have access to the agent's true label after the fact. In contrast, we study a setting where the decision maker has access to multiple actions but only can see the outcome of the action they assign. We formalize this setting as a contextual bandit problem, in which a decision maker must take actions based on a sequence of strategically modified contexts. We provide an algorithm with no regret compared to the best fixed policy in hindsight if the agents' were truthful when revealing their contexts (i.e., no-strategic-regret) for the two action setting, and prove that sublinear strategic regret is generally not possible for settings in which the number of actions is greater than two. Along the way, we obtain impossibility results for multi-class strategic classification which may be of independent interest. Link » Keegan Harris · Chara Podimata · Steven Wu 🔗 - Group Excess Risk Bound of Overparameterized Linear Regression with Constant-Stepsize SGD (Poster) []  []   link »    It has been observed that machine learning models trained using stochastic gradient descent (SGD) exhibit poor generalization to certain groups within and outside the population from which training instances are sampled. This has serious ramifications for the fairness, privacy, robustness, and out-of-distribution (OOD) generalization of machine learning. Hence, we theoretically characterize the inherent generalization of SGD-learned overparameterized linear regression to intra- and extra-population groups. We do this by proving an excess risk bound for an arbitrary group in terms of the full eigenspectra of the data covariance matrices of the group and population. We additionally provide a novel interpretation of the bound in terms of how the group and population data distributions differ and the effective dimension of SGD, as well as connect these factors to real-world challenges in practicing trustworthy machine learning. We further empirically validate the tightness of our bound on simulated data. Link » Arjun Subramonian · Levent Sagun · Kai-Wei Chang · Yizhou Sun 🔗 - Certified Training: Small Boxes are All You Need (Poster) []  []  []   link »    We propose the novel certified training method, SABR, which outperforms existing methods across perturbation magnitudes on MNIST, CIFAR-10, and TinyImageNet, in terms of both standard and certifiable accuracies. The key insight behind SABR is that propagating interval bounds for a small but carefully selected subset of the adversarial input region is sufficient to approximate the worst-case loss over the whole region while significantly reducing approximation errors. SABR does not only establish a new state-of-the-art in all commonly used benchmarks but more importantly, points to a new class of certified training methods promising to overcome the robustness-accuracy trade-off. Link » Mark Müller · Franziska Eckert · Marc Fischer · Martin Vechev 🔗 - Evaluating the Practicality of Counterfactual Explanation (Poster) []  []  []   link »    Machine learning models are increasingly used for decisions that directly affect people’s lives. These models are often opaque, meaning that the people affected cannot understand how or why the decision was made. However, according to the General Data Protection Regulation, decision subjects have the right to an explanation. Counterfactual explanations are a way to make machine learning models more transparent by showing how attributes need to be changed to get a different outcome. This type of explanation is considered easy to understand and human-friendly. To be used in real life, explanations must be practical, which means they must go beyond a purely theoretical framework. Research has focused on defining several objective functions to compute practical counterfactuals. However, it has not yet been tested whether people perceive the explanations as such in practice. To address this, we contribute by identifying properties that explanations must satisfy to be practical for human subjects. The properties are then used to evaluate the practicality of two counterfactual explanation methods (CARE and WachterCF) by conducting a user study. The results show that human subjects consider the explanations by CARE (a multi-objective approach) to be more practical than the WachterCF (baseline) explanations. We also show that the perception of explanations differs depending on the classification task by exploring multiple datasets. Link » Nina Spreitzer · Hinda Haned · Ilse van der Linden 🔗 - A Deep Dive into Dataset Imbalance and Bias in Face Identification (Poster) []  []  []   link »    As the deployment of automated face recognition (FR) systems proliferates, bias in these systems is not just an academic question, but a matter of public concern. Media portrayals often center imbalance as the main source of bias, i.e., that FR models perform worse on images of non-white people or women because these demographic groups are underrepresented in training data. Recent academic research paints a more nuanced picture of this relationship. However, previous studies of data imbalance in FR have focused exclusively on the face verification setting, while the face identification setting has been largely ignored, despite being deployed in sensitive applications such as law enforcement. This is an unfortunate omission, as 'imbalance' is a more complex matter in identification; imbalance may arise in not only the training data, but also the testing data, and furthermore may affect the proportion of identities belonging to each demographic group or the number of images belonging to each identity. In this work, we address this gap in the research by thoroughly exploring the effects of each kind of imbalance possible in face identification, and discuss other factors which may impact bias in this setting. Link » Valeriia Cherepanova · Steven Reich · Samuel Dooley · Hossein Souri · John Dickerson · Micah Goldblum · Tom Goldstein 🔗 - Differentially Private Gradient Boosting on Linear Learners for Tabular Data (Poster) []  []   link »    Gradient boosting takes \emph{linear} combinations of weak base learners. Therefore, absent privacy constraints (when we can exactly optimize over the base models) it is not effective when run over base learner classes that are closed under linear combinations (e.g. linear models). As a result, gradient boosting is typically implemented with tree base learners (e.g., XGBoost), and this has become the state of the art approach in tabular data analysis. Prior work on private gradient boosting focused on taking the state of the art algorithm in the non-private regime---boosting on trees---and making it differentially private. Surprisingly, we find that when we use differentially private learners, gradient boosting over trees is not as effective as gradient boosting over linear learners. In this paper, we propose differentially private gradient-boosted linear models as a private classification method for tabular data. We empirically demonstrate that, under strict privacy constraints, it yields higher F1 scores than the private versions of gradient-boosted trees on five real-world binary classification problems. This work adds to the growing picture that the most effective learning methods under differential privacy may be quite different from the most effective learning methods without privacy. Link » Saeyoung Rho · Shuai Tang · Sergul Aydore · Michael Kearns · Aaron Roth · Yu-Xiang Wang · Steven Wu · Cedric Archambeau 🔗 - TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations (Poster) []  []  []   link »    Machine Learning (ML) models are increasingly used to make critical decisions in real-world applications, yet they have become more complex, making them harder to understand. To this end, researchers have proposed several techniques to explain model predictions. However, practitioners struggle to use these explainability techniques because they often do not know which one to choose and how to interpret the results of the explanations. In this work, we address these challenges by introducing TalkToModel: an interactive dialogue system for explaining machine learning models through conversations. TalkToModel comprises 1) a dialogue engine that adapts to any tabular model and dataset, understands language, and generates responses, and 2) an execution component that constructs the explanations. In real-world evaluations with humans, 73% of healthcare workers (e.g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems for explainability in a disease prediction task, and 85% of ML professionals agreed TalkToModel was easier to use for computing explanations. Our findings demonstrate that TalkToModel is more effective for model explainability than existing systems, introducing a new category of explainability tools for practitioners. We release code a demo for the \sys system at: \texttt{anonymized}. Link » Dylan Slack · Satyapriya Krishna · Himabindu Lakkaraju · Sameer Singh 🔗 - Participatory Systems for Personalized Prediction (Poster) []  []   link »    Machine learning models often request personal information from users to assign more accurate predictions across a heterogeneous population. Personalized models are not built to support \emph{informed consent}: users cannot "opt-out" of providing personal data, nor understand the effects of doing so. In this work, we introduce a family of personalized prediction models called \emph{participatory systems} that support informed consent. Participatory systems are interactive prediction models that let users opt into reporting additional personal data at prediction time, and inform them about how their data will improve their predictions. We present a model-agnostic approach for supervised learning tasks where personal data is encoded as "group" attributes (e.g., sex, age group, HIV status). Given a pool of user-specified models, our approach can create a variety of participatory systems that differ in their training requirements and opportunities for informed consent. We conduct a comprehensive empirical study of participatory systems in clinical prediction tasks and compare them to common approaches for personalization. Our results show that our approach can produce participatory systems that exhibit large improvements in privacy, fairness, and performance at the population and group levels. Link » Hailey James · Chirag Nagpal · Katherine Heller · Berk Ustun 🔗 - On the Impact of Adversarially Robust Models on Algorithmic Recourse (Poster) []  []  []   link » The widespread deployment of machine learning models in various high-stakes settings has underscored the need for ensuring that individuals who are adversely impacted by model predictions are provided with a means for recourse. To this end, several algorithms have been proposed in recent literature to generate recourses. Recent research has also demonstrated that the recourses generated by these algorithms often correspond to adversarial examples. This key finding emphasizes the need for a deeper understanding of the impact of adversarially robust models (which are designed to guard against adversarial examples) on algorithmic recourse. In this work, we make one of the first attempts at studying the impact of adversarially robust models on algorithmic recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of the recourses output by state-of-the-art algorithms when the underlying models are adversarially robust. More specifically, we construct theoretical bounds on the differences between the cost and the validity of the recourses generated by various state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. We also carry out extensive empirical analysis with multiple real-world datasets to not only validate our theoretical results, but also analyze the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our theoretical and empirical analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thereby shedding light on the inherent trade-offs between achieving adversarial robustness in predictive models and providing easy-to-implement and reliable algorithmic recourse. Link » Satyapriya Krishna · Chirag Agarwal · Himabindu Lakkaraju 🔗 - What Makes a Good Explanation?: A Unified View of Properties of Interpretable ML (Poster) []  []   link »    Interpretability provides a means for humans to verify aspects of machine learning (ML) models. Different tasks require explanations with different properties. However, presently, there is a lack of standardization in assessing properties of explanations: different papers use the same term to mean different quantities, and different terms to mean the same quantity. This lack of standardization prevents us from rigorously comparing explanation systems. In this work, we survey explanation properties defined in the current interpretable ML literature, we synthesize properties based on what they measure, and describe the trade-offs between different formulations of these properties. We provide a unifying framework for comparing properties of interpretable ML. Link » Zixi Chen · Varshini Subhash · Marton Havasi · Weiwei Pan · Finale Doshi-Velez 🔗 - REGLO: Provable Neural Network Repair for Global Robustness Properties (Poster) []  []  []   link »    We present REGLO, a novel methodology for repairing neural networks to satisfy global robustness properties. In contrast to existing works that focus on local robustness, i.e., robustness of individual inputs, REGLO tackles global robustness, a strictly stronger notion that requires robustness for all inputs within a region. Leveraging an observation that any counterexample to a global robustness property must exhibit a corresponding large gradient, REGLO first identifies violating regions where the counterexamples reside, then uses verified robustness bounds on these regions to formulate a robust optimization problem to compute a minimal weight change in the network that will provably repair the violations. Experimental results demonstrate the effectiveness of REGLO across a set of benchmarks. Link » Feisi Fu · Zhilu Wang · Jiameng Fan · Yixuan Wang · Chao Huang · Xin Chen · Qi Zhu · Wenchao Li 🔗 - A View From Somewhere: Human-Centric Face Representations (Poster) []  []  []   link » We propose to implicitly learn a set of continuous face-varying dimensions, without ever asking an annotator to explicitly categorize a person. We uncover the dimensions by learning on a novel dataset of 638,180 human judgments of face similarity (FAX). We demonstrate the utility of our learned embedding space for predicting face similarity judgments, collecting continuous face attribute values, and attribute classification. Moreover, using a novel conditional framework, we show that an annotator's demographics influences the importance they place on different attributes when judging similarity, underscoring the need for diverse annotator groups to avoid biases. Link » Jerone Andrews · Przemyslaw Joniak · Alice Xiang 🔗 - On Causal Rationalization (Poster) []  []  []   link »    With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more rationales are highly intercorrelated, and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. We formally define the probability of causation in the rationale model with its identification established as the main component of learning necessary and sufficient rationales. The superior performance of our causal rationalization is demonstrated on real-world review and medical datasets with extensive experiments compared to state-of the-art methods. Link » Wenbo Zhang · TONG WU · Yunlong Wang · Yong Cai · Hengrui Cai 🔗 - Poisoning Generative Models to Promote Catastrophic Forgetting (Poster) []  []  []   link »    Generative models have grown into the workhorse of many state-of-the-art machine learning methods. However, their vulnerability under poisoning attacks has been largely understudied. In this work, we investigate this issue in the context of continual learning, where generative replayers are utilized to tackle catastrophic forgetting. By developing a novel customization of dirty-label input-aware backdoor to the online setting, our attacker manages to stealthily promote forgetting while retaining high accuracy at the current task and sustaining strong defenders. Our approach taps into an intriguing property of generative models, namely that they cannot well capture input-dependent triggers. Experiments on four standard datasets corroborate the poisoner’s effectiveness. Link » Siteng Kang · Xinhua Zhang 🔗 - On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition (Poster) []  []  []   link »    Face recognition systems are used widely but are known to exhibit bias across a range of sociodemographic dimensions, such as gender and race. An array of works proposing pre-processing, training, and post-processing methods have failed to close these gaps. Here, we take a very different approach to this problem, identifying that both architectures and hyperparameters of neural networks are instrumental in reducing bias. We first run a large-scale analysis of the impact of architectures and training hyperparameters on several common fairness metrics and show that the implicit convention of choosing high-accuracy architectures may be suboptimal for fairness. Motivated by our findings, we run the first neural architecture search for fairness, jointly with a search for hyperparameters. We output a suite of models which Pareto-dominate all other competitive architectures in terms of accuracy and fairness. Furthermore, we show that these models transfer well to other face recognition datasets with similar and distinct protected attributes. We release our code and raw result files so that researchers and practitioners can replace our fairness metrics with a bias measure of their choice. Link » Samuel Dooley · Rhea Sukthanker · John Dickerson · Colin White · Frank Hutter · Micah Goldblum 🔗 - Finding Safe Zones of Markov Decision Processes Policies (Poster) []  []  []   link »    Safety is essential for gaining trust in Markov Decision Process’s policies. We suggest a new method to improve safety, using Safe Zone. Given a policy, we define its Safe Zone as a subset of states, such that most of the policy’s trajectories are confined to this subset. A trajectory not entirely inside the Safe Zone is potentially unsafe and should be examined. The quality of the Safe Zone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. Safe Zone are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal Safe Zone, and show that in general, the problem is computationally hard. For this reason, we concentrate on computing approximate Safe Zone. Our main result is a bi-criteria approximation algorithm which gives a factor of almost 2 approximation for both the escape probability and Safe Zone size, using a polynomial size sample complexity Link » Michal Moshkovitz · Lee Cohen · Yishay Mansour 🔗 - Indiscriminate Data Poisoning Attacks on Neural Networks (Poster) []  []  []   link »    Data poisoning attacks, in which a malicious adversary aims to influence a model by injecting poisoned'' data into the training process, have attracted significant recent attention. In this work, we take a closer look at existing poisoning attacks and connect them with old and new algorithms for solving sequential Stackelberg games. By choosing an appropriate loss function for the attacker and optimizing with algorithms that exploit second-order information, we design poisoning attacks that are effective on neural networks. We present efficient implementations by parameterizing the attacker and allowing simultaneous and coordinated generation of tens of thousands of poisoned points, in contrast to existing methods that generate poisoned points one by one. We further perform extensive experiments that empirically explore the effect of data poisoning attacks on deep neural networks. Our paper set up a new benchmark on the possibility of performing indiscriminate data poisoning attacks on modern neural networks. Link » Yiwei Lu · Gautam Kamath · Yaoliang Yu 🔗 - Hybrid-EDL: Improving Evidential Deep Learning for Uncertainty Quantification on Imbalanced Data (Poster) []  []  []   link »    Uncertainty quantification is crucial for many safety-critical applications. Evidential Deep Learning (EDL) has been demonstrated to provide effective and efficient uncertainty estimates on well-curated data. Yet, the effect of class imbalance on performance remains not well understood. Since real-world data is often represented by a skewed class distribution, in this paper, we holistically study the behaviour of EDL, and further propose Hybrid-EDL by integrating data over-sampling and post-hoc calibration to boost the robustness of EDL. Extensive experiments on synthetic and real-world healthcare datasets with label distribution skew demonstrate the superiority of our Hybrid-EDL, in terms of in-domain categorical prediction and confidence estimation, as well as out-of-distribution detection. Our research closes the gap between the theory of uncertainty quantification and the practice of trustworthy applications. Link » Tong Xia · Jing Han · Lorena Qendro · Ting Dang · Cecilia Mascolo 🔗 - Bias Amplification in Image Classification (Poster) []  []   link »    Recent research suggests that predictions made by machine-learning models can amplify biases present in the training data. Mitigating such bias amplification requires a deep understanding of the mechanics in modern machine learning that give rise to that amplification. We perform the first systematic, controlled study into when and how bias amplification occurs. To enable this study, we design a simple image-classification problem in which we can tightly control (synthetic) biases. Our study of this problem reveals that the strength of bias amplification is correlated to measures such as model accuracy, model capacity, and amount of training data. We also find that bias amplification can vary greatly during training. Finally, we find that bias amplification may depend on the difficulty of the classification task relative to the difficulty of recognizing group membership: bias amplification appears to occur primarily when it is easier to recognize group membership than class membership. Our results suggest best practices for training machine-learning models that we hope will help pave the way for the development of better mitigation strategies. Link » Melissa Hall · Laurens van der Maaten · Laura Gustafson · Maxwell Jones · Aaron Adcock 🔗 - Certified Defences Against Adversarial Patch Attacks on Semantic Segmentation (Poster) []  []  []   link »    Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present Demasked Smoothing, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. Previous work on certifiably defending against patch attacks has mostly focused on image classification task and often required changes in the model architecture and additional training which is undesirable and computationally expensive. In Demasked Smoothing, any segmentation model can be applied without particular training, fine-tuning, or restriction of the architecture. Using different masking strategies, Demasked Smoothing can be applied both for certified detection and certified recovery. In extensive experiments we show that Demasked Smoothing can on average certify 63% of the pixel predictions for a 1% patch in the detection task and 46% against a 0.5% patch for the recovery task on the ADE20K dataset. Link » Maksym Yatsura · Kaspar Sakmann · N. Grace Hua · Matthias Hein · Jan Hendrik Metzen 🔗 - Cold Posteriors through PAC-Bayes (Poster) []  []  []   link »    We investigate the cold posterior effect through the lens of PAC-Bayes generalization bounds. We argue that in the non-asymptotic setting, when the number of training samples is (relatively) small, discussions of the cold posterior effect should take into account that approximate Bayesian inference does not readily provide guarantees of performance on out-of-sample data. Instead, out-of-sample error is better described through a generalization bound. In this context, we explore the connections of the ELBO objective from variational inference and the PAC-Bayes objectives. We note that, while the ELBO and PAC-Bayes objectives are similar, the latter objectives naturally contain a temperature parameter $\lambda$ which is not restricted to be $\lambda=1$. For realistic classification tasks, in the case of Laplace approximations to the posterior, we show how this PAC-Bayesian interpretation of the temperature parameter captures important aspects of the cold posterior effect. Link » Konstantinos Pitas · Julyan Arbel 🔗 - Men Also Do Laundry: Multi-Attribute Bias Amplification (Poster) []  []  []   link »    As computer vision systems become more widely deployed, there is growing concern from both the research community and the public that these systems are not only reproducing but also amplifying harmful social biases. The phenomenon of bias amplification, which is the focus of this work, refers to models amplifying inherent training set biases at test time. Existing metrics measure bias amplification with respect to single annotated attributes (e.g., $\texttt{computer}$). However, several visual datasets consist of images with multiple attribute annotations. We show models can exploit correlations with multiple attributes (e.g., {$\texttt{computer}$, $\texttt{keyboard}$}), which are not accounted for by current metrics. In addition, \new{we show} current metrics can give the impression that minimal or no bias amplification has occurred, as they involve aggregating over positive and negative values. Further, these metrics lack a clear desired value, making them difficult to interpret. To address these shortcomings, we propose a new metric: Multi-Attribute Bias Amplification. We validate our metric through an analysis of gender bias amplification on the COCO and imSitu datasets. Finally, we benchmark bias mitigation methods using our proposed metric, suggesting possible avenues for future bias mitigation efforts. Link » Dora Zhao · Jerone Andrews · Alice Xiang 🔗 - Anonymization for Skeleton Action Recognition (Poster) []  []  []   link »    Skeleton-based action recognition attracts practitioners and researchers due to the lightweight, compact nature of datasets. Compared with RGB-video-based action recognition, skeleton-based action recognition is a safer way to protect the privacy of subjects while having competitive recognition performance. However, due to improvements in skeleton estimation algorithms as well as motion- and depth- sensors, more details of motion characteristics can be preserved in the skeleton dataset, leading to potential privacy leakage. To investigate the potential privacy leakage from skeleton datasets, we first train a classifier to categorize sensitive private information from trajectories of joints. Our preliminary experiments show that the gender classifier achieves 87% accuracy on average and the re-identification task achieves 80% accuracy on average for three baseline models: Shift-GCN, MS- G3D, and 2s-AGCN. We propose an adversarial anonymization algorithm to protect potential privacy leakage from the skeleton dataset. Experimental results show that an anonymized dataset can reduce the risk of privacy leakage while having marginal effects on action recognition performance. Link » Saemi Moon · Myeonghyeon Kim · Zhenyue Qin · Yang Liu · Dongwoo Kim 🔗 - Provable Re-Identification Privacy (Poster) []  []  []   link »    In applications involving sensitive data, such as finance and healthcare, the necessity for preserving data privacy can be a significant barrier to machine learning model development. Differential privacy (DP) has emerged as one canonical standard for provable privacy. However, DP's strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning; and DP guarantees themselves can be difficult to interpret. As a result, standard DP has encountered deployment challenges in practice. In this work, we propose a different privacy notion, re-identification privacy (RIP), to address these challenges. RIP guarantees are easily interpretable in terms of the success rate of membership inference attacks. We give a precise characterization of the relationship between RIP and DP, and show that RIP can be achieved using less randomness compared to the amount required for guaranteeing DP, leading to smaller drop in utility. Our theoretical results also give rise to a simple algorithm for guaranteeing RIP which can be used as a wrapper around any algorithm with a continuous output, including parametric model training. Link » Zachary Izzo · Jinsung Yoon · Sercan Arik · James Zou 🔗 - Attack-Agnostic Adversarial Detection (Poster) []  []  []   link »    The growing number of adversarial attacks in recent years gives attackers an advantage over defenders, as defenders must train detectors after knowing the types of attacks, and many models need to be maintained to ensure good performance in detecting any upcoming attacks. We propose a way to end the tug-of-war between attackers and defenders by treating adversarial attack detection as an anomaly detection problem so that the detector is agnostic to the attack. We quantify the statistical deviation caused by adversarial perturbations in two aspects. The Least Significant Component Feature (LSCF) quantifies the deviation of adversarial examples from the statistics of benign samples and Hessian Feature (HF) reflects how adversarial examples distort the landscape of models' optima by measuring the local loss curvature. Empirical results show that our method can achieve an overall ROC AUC of 94.9%, 89.7%, and 97.9% on CIFAR10, CIFAR100, and SVHN, respectively, and has comparable performance to adversarial detectors trained with adversarial examples on most of the attacks. Link » Jiaxin Cheng · Mohamed Hussein · Jayadev Billa · Wael Abd-Almageed 🔗 - Is the Next Winter Coming for AI?The Elements of Making Secure and Robust AI (Poster) []  []  []   link »    While the recent boom in Artificial Intelligence (AI) has given rise to the technology's use and popularity across many domains, the same boom has exposed vulnerabilities of the technology to many threats that could cause the next "AI winter". AI is no stranger to "winters", or drops in funding and interest in the technology and its applications. Many in the field consider the early 1970's as the first AI winter with another proceeding in the late 1990's and early 2000's. There is some consensus that another AI winter is all but inevitable in some shape or form, however, current thoughts on the next winter do not consider secure and robust AI and the implications of the success or failure of these areas. The emergence of AI as an operational technology introduces potential vulnerabilities to AI's longevity. The National Security Commission on AI (NSCAI) report outlines recommendations for building secure and robust AI, particularly in government and Department of Defense (DoD) applications. However, are they enough to help us fully secure AI systems and prevent the next "AI winter"? An approaching "AI Winter" would have a tremendous impact in DoD systems as well as those of our adversaries. Understanding and analyzing the potential of this event would better prepare us for such an outcome as well as help us understand the tools needed to counter and prevent this "winter" by securing and robustifying our AI systems. In this paper, we introduce the following four pillars of AI assurance, that if implemented, will help us to avoid the next AI winter: security, fairness, trust, and resilience. Link » Josh Harguess 🔗 - Visual Prompting for Adversarial Robustness (Poster) []  []  []   link » In this work, we leverage visual prompting (VP) to improve adversarial robustness of a fixed, pre-trained model at testing time. Compared to conventional adversarial defenses, VP allows us to design universal (i.e., data-agnostic) input prompting templates, which have plug-and-play capabilities at testing time to achieve desired model performance without introducing much computation overhead. Although VP has been successfully applied to improving model generalization, it remains elusive whether and how it can be used to defend against adversarial attacks. We investigate this problem and show that the vanilla VP approach is not effective in adversarial defense since a universal input prompt lacks the capacity for robust learning against sample-specific adversarial perturbations. To circumvent it, we propose a new VP method, termed Class-wise Adversarial Visual Prompting (C-AVP), to generate class-wise visual prompts so as to not only leverage the strengths of ensemble prompts but also optimize their interrelations to improve model robustness. Our experiments show that C-AVP outperforms the conventional VP method, with 2.1$\times$ standard accuracy gain and 2$\times$ robust accuracy gain. Compared to classical test-time defenses, C-AVP also yields a 42$\times$ inference time speedup. Link » Aochuan Chen · Peter Lorenz · Yuguang Yao · Pin-Yu Chen · Sijia Liu 🔗 - When Fairness Meets Privacy: Fair Classification with Semi-Private Sensitive Attributes (Poster) []  []  []   link »    Machine learning models have demonstrated promising performances in many areas. However, the concerns that they can be biased against specific groups hinder their adoption in high-stake applications. Thus, it is essential to ensure fairness in machine learning models. Most of the previous efforts require access to sensitive attributes for mitigating bias. Nevertheless, it is often infeasible to obtain a large scale of data with sensitive attributes due to people's increasing awareness of privacy and the legal compliance. Therefore, an important research question is how to make fair predictions under privacy. In this paper, we study a novel problem of fair classification in a semi-private setting, where most of the sensitive attributes are private and only a small amount of clean ones are available. To this end, we propose a novel framework FairSP that can first learn to correct the noisy sensitive attributes under the privacy guarantee by exploiting the limited clean ones. Then, it jointly models the corrected and clean data in an adversarial way for debiasing and prediction. Theoretical analysis shows that the proposed model can ensure fairness when most sensitive attributes are private. Extensive experimental results in real-world datasets demonstrate the effectiveness of the proposed model for making fair predictions under privacy and maintaining high accuracy. Link » Canyu Chen · Yueqing Liang · Xiongxiao Xu · Shangyu Xie · Yuan Hong · Kai Shu 🔗 - Differentially Private Bias-Term only Fine-tuning of Foundation Models (Poster) []  []  []   link »    We study the problem of differentially private (DP) fine-tuning of large pre-trained models — a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraint, yet requires significant computational overhead or modifications to the network architecture.We propose differentially private bias-term fine-tuning (DP-BiTFiT), which matches the state-of-the-art accuracy for DP algorithms and the efficiency of the standard BiTFiT. DP-BiTFiT is model agnostic (not modifying the network architecture), parameter efficient (only training about $0.1\%$ of the parameters), and computation efficient (almost removing the overhead caused by DP, in both the time and space complexity). On a wide range of tasks, DP-BiTFiT is $2\sim 30\times$ faster and uses $2\sim 8\times$ less memory than DP full fine-tuning, even faster than the standard full fine-tuning. This amazing efficiency enables us to conduct DP fine-tuning on language and vision tasks with long-sequence texts and high-resolution images, which were computationally difficult using existing methods. Link » Zhiqi Bu · Yu-Xiang Wang · Sheng Zha · George Karypis 🔗 - On the Feasibility of Compressing Certifiably Robust Neural Networks (Poster) []  []  []   link »    Knowledge distillation is a popular approach to compress high-performance neural networks for use in resource-constrained environments. However, the threat of adversarial machine learning poses the question: Is it possible to compress adversarially robust networks and achieve similar or better adversarial robustness as the original network? In this paper, we explore this question with respect to $\textit{certifiable robustness defenses}$, in which the defense establishes a formal robustness guarantee irrespective of the adversarial attack methodology. We present our preliminary findings answering two main questions: 1) Is the traditional knowledge distillation sufficient to compress certifiably robust neural networks? and 2) What aspects of the transfer process can we modify to improve the compression effectiveness? Our work represents the first study of the interaction between machine learning model compression and certifiable robustness. Link » Pratik Vaishnavi · Veena Krish · Farhan Ahmed · Kevin Eykholt · Amir Rahmati 🔗 - COVID-Net Biochem: An Explainability-driven Framework to Building Machine Learning Models for Predicting Survival and Kidney Injury of COVID-19 Patients from Clinical and Biochemistry Data (Poster) []  []  []   link » A major challenge faced during the pandemic has been the prediction of survival and the risk for additional injuries in individual patients, which requires significant clinical expertise and additional resources to avoid further complications. In this study we propose COVID-Net Biochem, an explainability-driven framework for building machine learning models to predict patient survival and the chance of developing kidney injury during hospitalization from clinical and biochemistry data in a transparent and systematic manner. In the first clinician-guided initial design'' phase, we prepared a benchmark dataset of carefully selected clinical and biochemistry data based on clinician assessment, which were curated from a patient cohort of 1366 patients at Stony Brook University. A collection of different machine learning models with a diversity of gradient based boosting tree architectures and deep transformer architectures was designed and trained specifically for survival and kidney injury prediction based on the carefully selected clinical and biochemical markers. In the secondexplainability-driven design refinement'' phase, we harnessed explainability methods to not only gain a deeper understanding into the decision-making process of the individual models, but also identify the overall impact of the individual clinical and biochemical markers to identify potential biases. These explainability outcomes are further analyzed by a clinician with over eight years experience to gain a deeper understanding of clinical validity of decisions made. These explainability-driven insights gained alongside the associated clinical feedback are then leveraged to guide and revise the training policies and architectural design in an iterative manner to improve not just prediction performance but also improve clinical validity and trustworthiness of the final machine learning models. Using the proposed explainable-driven framework, we achieved 97.4\% accuracy in survival prediction and 96.7\% accuracy in predicting kidney injury complication, with the models made available in an open source manner. While not a production-ready solution, the ultimate goal of this study is to act as a catalyst for clinical scientists, machine learning researchers, as well as citizen scientists to develop innovative and trust-worthy clinical decision support solutions for helping clinicians around the world manage the continuing pandemic. Link » Hossein Aboutalebi · Maya Pavlova · Mohammad Javad Shafiee · Adrian Florea · Andrew Hryniowski · Alexander Wong 🔗 - Towards Algorithmic Fairness in Space-Time: Filling in Black Holes (Poster) []  []  []   link »    New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society.For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from historical redlining practices; and studies have found varying quality and coverage in the collection and sharing of open-source geospatial data. Despite the extensive literature on machine learning (ML) fairness, few algorithmic strategies have been proposed to mitigate such biases. In this paper we highlight the unique challenges for quantifying and addressing bias in spatio-temporal data, through the lens of use cases presented in the scientific literature and media. We envision a roadmap of ML strategies that need to be developed or adapted to quantify and overcome these challenges---including transfer learning, active learning, and reinforcement learning techniques. Further, we discuss the potential role of ML in providing guidance to policy makers on issues related to spatial fairness. Link » Cheryl Flynn · Aritra Guha · Subhabrata Majumdar · Divesh Srivastava · Zhengyi Zhou 🔗 - Just Following AI Orders: When Unbiased People Are Influenced By Biased AI (Poster) []  []  []   link » Prior research has shown that artificial intelligence (AI) systems often encode biases against minority subgroups; however, little work has focused on ways to mitigate the harm discriminatory algorithms can cause in high-stakes settings such as medicine. In this study, we experimentally evaluated the impact biased AI recommendations have on emergency decisions, where participants respond to mental health crises by calling for either medical or police assistance. We found that although respondent decisions were not biased without advice, both clinicians and non-experts were influenced by prescriptive recommendations from a biased algorithm, choosing police help more often in emergencies involving African-American or Muslim men. Crucially, we also found that using descriptive flags rather than prescriptive recommendations allowed respondents to retain their original, unbiased decision-making. Our work demonstrates the practical danger of using biased models in health contexts, and suggests that appropriately framing decision support can mitigate the effects of AI bias. These findings must be carefully considered in the many real-world clinical scenarios where inaccurate or biased models may be used to inform important decisions. Link » Hammaad Adam · Aparna Balagopalan · Emily Alsentzer · Fotini Christia · Marzyeh Ghassemi 🔗 - Not All Knowledge Is Created Equal: Mutual Distillation of Confident Knowledge (Poster) []  []  []   link »    Mutual knowledge distillation (MKD) improves a model by distilling knowledge from another model. However, \textit{not all knowledge is certain and correct}, especially under adverse conditions.For example, label noise usually leads to less reliable models due to undesired memorization. Wrong knowledge harms the learning rather than helps it. This problem can be handled by two aspects: (i) knowledge source, improving the reliability of each model (knowledge producer) improving the knowledge source's reliability; (ii) selecting reliable knowledge for distillation. Making a model more reliable is widely studied while selective MKD receives little attention. Therefore, we focus on studying selective MKD and highlight its importance in this work. Concretely, a generic MKD framework, \underline{C}onfident knowledge selection followed by \underline{M}utual \underline{D}istillation (CMD), is designed. The key component of CMD is a generic knowledge selection formulation, making the selection threshold either static (CMD-S) or progressive (CMD-P). Additionally, CMD covers two special cases: zero knowledge and all knowledge, leading to a unified MKD framework. Extensive experiments are present to demonstrate the effectiveness of CMD and thoroughly justify the design of CMD. Link » ZIYUN LI · Xinshao Wang · Christoph Meinel · Neil Robertson · David Clifton · Haojin Yang 🔗 - Scalable and Improved Algorithms for Individually Fair Clustering (Poster) []  []   link » We present scalable and improved algorithms for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al and Mahabadi et al.Given $n$ points $P$ in a metric space, let $\delta(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $\nicefrac nk$ points. In this work, we present two main contributions.We first present local-search algorithms improving prior work along cost and maximum fairness violation.Then we design a fast local-search algorithmthat runs in $\tO(nk^2)$ time and obtains a bicriteria $(O(1), 6)$ approximation. Finally we show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions. Link » Mohammadhossein Bateni · Vincent Cohen-Addad · Alessandro Epasto · Silvio Lattanzi 🔗 - Membership Inference Attacks via Adversarial Examples (Poster) []  []  []   link »    The raise of machine learning and deep learning led to significant improvement in several domains. This change is supported by both the dramatic rise in computation power and the collection of large datasets. Such massive datasets often include personal data which can represent a threat to privacy. Membership inference attacks are a novel direction of research which aims at recovering training data used by a learning algorithm. In this paper, we develop a mean to measure the leakage of training data leveraging a quantity appearing as a proxy of the total variation of a trained model near its training samples. We extend our work by providing a novel defense mechanism. Our contributions are supported by empirical evidence through convincing numerical experiments. Link » Hamid Jalalzai · Elie Kadoche · Rémi Leluc · Vincent Plassier 🔗 - Take 5: Interpretable Image Classification with a Handful of Features (Poster) []  []  []   link »    Deep Neural Networks use thousands of mostly incomprehensible features to identify a single class, a decision no human can follow. We propose an interpretable sparse and low dimensional final decision layer in a deep neural network with measurable aspects of interpretability and demonstrate it on fine-grained image classification. We argue that a human can only understand the decision of a machine learning model, if the input features are interpretable and only very few of them are used for a single decision. For that matter, the final layer has to be sparse and – to make interpreting the features feasible – low dimensional. We call a model with a Sparse Low-Dimensional Decision “SLDD-Model”. We show that a SLDD-Model is easier to interpret locally and globally than a dense high-dimensional decision layer while being able to maintain competitive accuracy. Additionally, we propose a loss function that improves a model’s feature diversity and accuracy. Our interpretable SLDD-Model only uses 5 out of just 50 features per class, while maintaining 97% to 100% of the accuracy on four common benchmark datasets compared to the baseline model with 2048 features. Link » Thomas Norrenbrock · Marco Rudolph · Bodo Rosenhahn 🔗

#### Author Information

##### Linyi Li (University of Illinois Urbana-Champaign)

A Ph.D. candidate working on robust machine learning and verification.

##### Chaowei Xiao (ASU/NVIDIA)

I am Chaowei Xiao, a third year PhD student in CSE Department, University of Michigan, Ann Arbor. My advisor is Professor Mingyan Liu . I obtained my bachelor's degree in School of Software from Tsinghua University in 2015, advised by Professor Yunhao Liu, Professor Zheng Yang and Dr. Lei Yang. I was also a visiting student at UC Berkeley in 2018, advised by Professor Dawn Song and Professor Bo Li. My research interest includes adversarial machine learning.

##### J. Zico Kolter (Carnegie Mellon University / Bosch Center for AI)

Zico Kolter is an Assistant Professor in the School of Computer Science at Carnegie Mellon University, and also serves as Chief Scientist of AI Research for the Bosch Center for Artificial Intelligence. His work focuses on the intersection of machine learning and optimization, with a large focus on developing more robust, explainable, and rigorous methods in deep learning. In addition, he has worked on a number of application areas, highlighted by work on sustainability and smart energy systems. He is the recipient of the DARPA Young Faculty Award, and best paper awards at KDD, IJCAI, and PESGM.