Zero-Infinity GAN: Stable Dynamics and Implicit Bias of Extragradient
Abstract
In supervised learning, gradient descent drives neural networks to have near-zero empirical risk, while favoring solutions that generalize well---a phenomenon attributed to the implicit bias of gradient-based optimization. In stark contrast, in generative models such as generative adversarial networks (GANs), gradient methods typically fail to achieve zero empirical risk, leaving implicit bias both empirically elusive and theoretically unexplored. We bridge this gap by developing new perspectives on the loss landscape of GANs together with the gradient dynamics and implicit bias of extragradient. On the loss landscape side, we challenge the prevailing preference for the Wasserstein distance, and instead propose the zero-infinity distance---a metric that equals zero when two distributions match exactly and infinity otherwise---as more compatible with gradient-based minimax optimization. On the gradient dynamics side, we prove that some non-optimal stationary points are strict, analogous to strict saddles in nonconvex minimization. This property enables the two-timescale extragradient method to escape such points---similar to gradient descent escaping strict saddles---while being stable at global solutions, in contrast to other existing gradient methods. Lastly, on the implicit bias side, we show that extragradient favors the minimum-norm generator solution when initialized at zero.