Skip to yearly menu bar Skip to main content


Poster

On the Efficiency of ERM in Feature Learning

Ayoub El Hanchi · Chris Maddison · Murat Erdogdu


Abstract: Given a collection of feature maps indexed by a set $\mathcal{T}$, we study the performance of empirical risk minimization (ERM) on regression problems with square loss over the union of the linear classes induced by these feature maps. This setup aims at capturing feature learning, where the model is expected to jointly learn from the data an appropriate feature map as well as a linear predictor on top of it. We start by studying the asymptotic quantiles of the excess risk of ERM in this setting. Remarkably, we show that when there is a unique optimal feature map, these quantiles coincide, up to a factor of two, with those of the excess risk of the oracle procedure, which knows a priori this optimal feature map and deterministically outputs an empirical risk minimizer from the optimal linear class. We derive a non-asymptotic upper bound on the excess risk that captures a refined version of this phenomenon. Specifically, under a mild assumption, the upper bound we derive depends on the sample size $n$ as $C_{n}/n$, where $C_{n}$ is a sequence of monotonically decreasing distribution-dependent constants. Finally, we specialize our analysis to the case where the set $\mathcal{T}$ is finite, where we explicitly compute these constants in terms of moments of the feature maps and target as well as the size of $\mathcal{T}$.

Live content is unavailable. Log in and register to view live content