Timezone: »

A Group-Theoretic Framework for Data Augmentation
Shuxiao Chen · Edgar Dobriban · Jane Lee

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #991

Data augmentation has become an important part of modern deep learning pipelines and is typically needed to achieve state of the art performance for many learning tasks. It utilizes invariant transformations of the data, such as rotation, scale, and color shift, and the transformed images are added to the training set. However, these transformations are often chosen heuristically and a clear theoretical framework to explain the performance benefits of data augmentation is not available. In this paper, we develop such a framework to explain data augmentation as averaging over the orbits of the group that keeps the data distribution approximately invariant, and show that it leads to variance reduction. We study finite-sample and asymptotic empirical risk minimization and work out as examples the variance reduction in certain two-layer neural networks. We further propose a strategy to exploit the benefits of data augmentation for general learning tasks.

Author Information

Shuxiao Chen (University of Pennsylvania)
Edgar Dobriban (University of Pennsylvania)
Jane Lee (University of Pennsylvania)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors