Training Language GANs from Scratch
Cyprien de Masson d'Autume · Shakir Mohamed · Mihaela Rosca · Jack Rae

Thu Dec 12th 10:45 AM -- 12:45 PM @ East Exhibition Hall B + C #127

Generative Adversarial Networks (GANs) enjoy great success at image generation, but have proven difficult to train in the domain of natural language. Challenges with gradient estimation, optimization instability, and mode collapse have lead practitioners to resort to maximum likelihood pre-training, followed by small amounts of adversarial fine-tuning. The benefits of GAN fine-tuning for language generation are unclear, as the resulting models produce comparable or worse samples than traditional language models. We show it is in fact possible to train a language GAN from scratch --- without maximum likelihood pre-training. We combine existing techniques such as large batch sizes, dense rewards and discriminator regularization to stabilize and improve language GANs. The resulting model, ScratchGAN, performs comparably to maximum likelihood training on EMNLP2017 News and WikiText-103 corpora according to quality and diversity metrics.

Author Information

Cyprien de Masson d'Autume (Google DeepMind)
Shakir Mohamed (DeepMind)

Shakir Mohamed is a senior staff scientist at DeepMind in London. Shakir's main interests lie at the intersection of approximate Bayesian inference, deep learning and reinforcement learning, and the role that machine learning systems at this intersection have in the development of more intelligent and general-purpose learning systems. Before moving to London, Shakir held a Junior Research Fellowship from the Canadian Institute for Advanced Research (CIFAR), based in Vancouver at the University of British Columbia with Nando de Freitas. Shakir completed his PhD with Zoubin Ghahramani at the University of Cambridge, where he was a Commonwealth Scholar to the United Kingdom. Shakir is from South Africa and completed his previous degrees in Electrical and Information Engineering at the University of the Witwatersrand, Johannesburg.

Mihaela Rosca (Google DeepMind)
Jack Rae (DeepMind, UCL)

More from the Same Authors