Timezone: »

Poster
The Effects of Regularization and Data Augmentation are Class Dependent
Randall Balestriero · Leon Bottou · Yann LeCun

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #642
Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation over all classes leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the barn spider'' classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance ---averaged over all classes and samples--- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that finding a correct measure of a model's complexity without class-dependent preference remains an open research question.

#### Author Information

##### Leon Bottou (Facebook AI Research)

Léon Bottou received a Diplôme from l'Ecole Polytechnique, Paris in 1987, a Magistère en Mathématiques Fondamentales et Appliquées et Informatiques from Ecole Normale Supérieure, Paris in 1988, and a PhD in Computer Science from Université de Paris-Sud in 1991. He joined AT&T Bell Labs from 1991 to 1992 and AT&T Labs from 1995 to 2002. Between 1992 and 1995 he was chairman of Neuristique in Paris, a small company pioneering machine learning for data mining applications. He has been with NEC Labs America in Princeton since 2002. Léon's primary research interest is machine learning. His contributions to this field address theory, algorithms and large scale applications. Léon's secondary research interest is data compression and coding. His best known contribution in this field is the DjVu document compression technology (http://www.djvu.org.) Léon published over 70 papers and is serving on the boards of JMLR and IEEE TPAMI. He also serves on the scientific advisory board of Kxen Inc .