Spotlight
Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals
Surbhi Goel · Sushrut Karmalkar · Adam Klivans

Wed Dec 11th 04:05 -- 04:10 PM @ West Exhibition Hall C + B3

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\opt < 1$ be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time. \item There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error $O(\opt^{2/3})$. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss. \end{itemize} Prior work due to Soltanolkotabi \cite{soltanolkotabi2017learning} showed that gradient descent {\em can} find the best-fitting ReLU with respect to Gaussian marginals, if the training set is {\em exactly} labeled by a ReLU.

Author Information

Surbhi Goel (The University of Texas at Austin)
Sushrut Karmalkar (The University of Texas at Austin)
Adam Klivans (UT Austin)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors