Timezone: »
Despite the non-convex nature of their loss functions, deep neural networks are known to generalize well when optimized with stochastic gradient descent (SGD). Recent work conjectures that SGD with proper configuration is able to find wide and flat local minima, which are correlated with good generalization performance. In this paper, we observe that local minima of modern deep networks are more than being flat or sharp. Instead, at a local minimum there exist many asymmetric directions such that the loss increases abruptly along one side, and slowly along the opposite side – we formally define such minima as asymmetric valleys. Under mild assumptions, we first prove that for asymmetric valleys, a solution biased towards the flat side generalizes better than the exact empirical minimizer. Then, we show that performing weight averaging along the SGD trajectory implicitly induces such biased solutions. This provides theoretical explanations for a series of intriguing phenomena observed in recent work [25, 5, 51]. Finally, extensive empirical experiments on both modern deep networks and simple 2 layer networks are conducted to validate our assumptions and analyze the intriguing properties of asymmetric valleys.
Author Information
Haowei He (Tsinghua University)
Gao Huang (Tsinghua)
Yang Yuan (Cornell University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Spotlight: Asymmetric Valleys: Beyond Sharp and Flat Local Minima »
Wed. Dec 11th 12:25 -- 12:30 AM Room West Exhibition Hall A
More from the Same Authors
-
2021 Spotlight: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2023 Poster: Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning »
Shenzhi Wang · Qisen Yang · Jiawei Gao · Matthieu Lin · HAO CHEN · Liwei Wu · Ning Jia · Shiji Song · Gao Huang -
2023 Poster: STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning »
Weipu Zhang · Gang Wang · Jian Sun · Yetian Yuan · Gao Huang -
2023 Poster: Rank-DETR for High Quality Object Detection »
Yifan Pu · Weicong Liang · Yiduo Hao · YUHUI YUAN · Yukang Yang · Chao Zhang · Han Hu · Gao Huang -
2023 Poster: Trade-off Between Efficiency and Consistency for Removal-based Explanations »
Yifan Zhang · Haowei He · Zhiquan Tan · Yang Yuan -
2023 Poster: Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL »
Yang Yue · Rui Lu · Bingyi Kang · Shiji Song · Gao Huang -
2021 Poster: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2021 Poster: Searching Parameterized AP Loss for Object Detection »
Tao Chenxin · Zizhang Li · Xizhou Zhu · Gao Huang · Yong Liu · jifeng dai -
2021 Poster: Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition »
Yulin Wang · Rui Huang · Shiji Song · Zeyi Huang · Gao Huang -
2020 Poster: Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification »
Yulin Wang · Kangchen Lv · Rui Huang · Shiji Song · Le Yang · Gao Huang -
2019 Poster: Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning »
Wenjie Shi · Shiji Song · Hui Wu · Ya-Chu Hsu · Cheng Wu · Gao Huang -
2019 Poster: Implicit Semantic Data Augmentation for Deep Networks »
Yulin Wang · Xuran Pan · Shiji Song · Hong Zhang · Gao Huang · Cheng Wu -
2019 Poster: Learning-Based Low-Rank Approximations »
Piotr Indyk · Ali Vakilian · Yang Yuan -
2018 Poster: Expanding Holographic Embeddings for Knowledge Completion »
Yexiang Xue · Yang Yuan · Zhitian Xu · Ashish Sabharwal -
2017 Poster: Convergence Analysis of Two-layer Neural Networks with ReLU Activation »
Yuanzhi Li · Yang Yuan -
2016 Poster: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters »
Zeyuan Allen-Zhu · Yang Yuan · Karthik Sridharan