Timezone: »
We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely, Frostig and Singer. The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guarantee for the standard neural network learning algorithm for networks of depth more that two. As corollaries, it follows that for neural networks of any depth between 2 and log(n), SGD is guaranteed to learn, in polynomial time, constant degree polynomials with polynomially bounded coefficients. Likewise, it follows that SGD on large enough networks can learn any continuous function (not in polynomial time), complementing classical expressivity results.
Author Information
Amit Daniely (Google Research)
More from the Same Authors
-
2020 Poster: Neural Networks Learning and Memorization with (almost) no Over-Parameterization »
Amit Daniely -
2020 Poster: Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations »
Amit Daniely · Hadas Shacham -
2020 Spotlight: Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations »
Amit Daniely · Hadas Shacham -
2020 Poster: Learning Parities with Neural Networks »
Amit Daniely · Eran Malach -
2020 Poster: Hardness of Learning Neural Networks with Natural Weights »
Amit Daniely · Gal Vardi -
2020 Oral: Learning Parities with Neural Networks »
Amit Daniely · Eran Malach -
2019 Poster: Locally Private Learning without Interaction Requires Separation »
Amit Daniely · Vitaly Feldman -
2019 Poster: Generalization Bounds for Neural Networks via Approximate Description Length »
Amit Daniely · Elad Granot -
2019 Spotlight: Generalization Bounds for Neural Networks via Approximate Description Length »
Amit Daniely · Elad Granot -
2016 Poster: Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity »
Amit Daniely · Roy Frostig · Yoram Singer -
2013 Poster: More data speeds up training time in learning halfspaces over sparse vectors »
Amit Daniely · Nati Linial · Shai Shalev-Shwartz -
2013 Spotlight: More data speeds up training time in learning halfspaces over sparse vectors »
Amit Daniely · Nati Linial · Shai Shalev-Shwartz -
2012 Poster: Multiclass Learning Approaches: A Theoretical Comparison with Implications »
Amit Daniely · Sivan Sabato · Shai Shalev-Shwartz -
2012 Spotlight: Multiclass Learning Approaches: A Theoretical Comparison with Implications »
Amit Daniely · Sivan Sabato · Shai Shalev-Shwartz