Spotlight: Reliably Learning the ReLU in Polynomial Time
in
Workshop: OPT 2016: Optimization for Machine Learning
Abstract
We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form max(0, w.x) with w a unit vector (2-norm equal to 1). Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade, and Mansour where the learner is given access to a distribution D on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the l_p loss (for p=1 or p >=2) of inputs given positive labels by D.
It runs in polynomial-time (in n) with respect to {\em any} distribution on S^{n-1} (the unit sphere in n dimensions) and for any error parameter \epsilon = \Omega(1/ \log n). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where epsilon must be Omega(1) and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLUs.
Our techniques combine kernel methods and polynomial approximations with a dual-loss'' approach to convex programming. As a byproduct we also obtain the first set of efficient algorithms forconvex piecewise-linear fitting,'' and the first efficient algorithms for agnostically learning low-weight multivariate polynomials on the unit sphere.