Timezone: »
Interestingly, recent experimental results [2, 26, 22] have identified a robust fairness phenomenon in adversarial training (AT), namely that a robust model well-trained by AT exhibits a remarkable disparity of standard accuracy and robust accuracy among different classes compared with natural training. However, the effect of different perturbation radii in AT on robust fairness has not been studied, and one natural question is raised: does a tradeoff exist between average robustness and robust fairness? Our extensive experimental results provide an affirmative answer to this question: with an increasing perturbation radius, stronger AT will lead to a larger class-wise disparity of robust accuracy. Theoretically, we analyze the class-wise performance of adversarially trained linear models with mixture Gaussian distribution. Our theoretical results support our observations. Moreover, our theory shows that adversarial training easily leads to more serious robust fairness issue than natural training. Motivated by theoretical results, we propose a fairly adversarial training (FAT) method to mitigate the tradeoff between average robustness and robust fairness. Experimental results validate the effectiveness of our proposed method.
Author Information
Xinsong Ma (Wuhan University)
Zekai Wang (Wuhan University)
Weiwei Liu (Wuhan University)
More from the Same Authors
-
2022 Poster: Defending Against Adversarial Attacks via Neural Dynamic System »
Xiyuan Li · Zou Xin · Weiwei Liu -
2022 Poster: On Robust Multiclass Learnability »
Jingyuan Xu · Weiwei Liu -
2019 Poster: Copula Multi-label Learning »
Weiwei Liu -
2017 Poster: Sparse Embedded $k$-Means Clustering »
Weiwei Liu · Xiaobo Shen · Ivor Tsang -
2015 Poster: On the Optimality of Classifier Chain for Multi-label Classification »
Weiwei Liu · Ivor Tsang