Timezone: »

Non-convexity in the error landscape and the expressive capacity of deep neural networks
Surya Ganguli

Fri Dec 09 02:30 AM -- 03:00 AM (PST) @ None

A variety of recent work has studied saddle points in the error landscape of deep neural networks. A clearer understanding of these saddle points is likely to arise from an understanding of the geometry of deep functions. In particular, what do the generic functions computed by a deep network “look like?” How can we quantify and understand their geometry, and what implications does this geometry have for reducing generalization error as well as training error? We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of generic deep functions. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides intuition for why initializations at the edge of chaos can lead to both better optimization as well as superior generalization capabilities.

Author Information

Surya Ganguli (Stanford)

More from the Same Authors