Timezone: »

Is normalization indispensable for training deep neural network?
Jie Shao · Kai Hu · Changhu Wang · Xiangyang Xue · Bhiksha Raj

Thu Dec 10 06:00 PM -- 06:15 PM (PST) @ Orals & Spotlights: Deep Learning

Normalization operations are widely used to train deep neural networks, and they can improve both convergence and generalization in most tasks. The theories for normalization's effectiveness and new forms of normalization have always been hot topics in research. To better understand normalization, one question can be whether normalization is indispensable for training deep neural network? In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation. Our proposed method can achieve the same or even slightly better performance in a variety of tasks: image classification in ImageNet, object detection and segmentation in MS-COCO, video classification in Kinetics, and machine translation in WMT English-German, etc. Our study may help better understand the role of normalization layers and can be a competitive alternative to normalization layers. Codes are available.

Author Information

Jie Shao (Fudan University)
Kai Hu (Carnegie Mellon University)
Changhu Wang (ByteDance.Inc)
Xiangyang Xue (Fudan University)
Bhiksha Raj (Carnegie Mellon University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors