Timezone: »

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Boxin Wang · Wei Ping · Chaowei Xiao · Peng Xu · Mostofa Patwary · Mohammad Shoeybi · Bo Li · Anima Anandkumar · Bryan Catanzaro

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #231

Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we demonstrate that using self-generated datasets consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 3 1 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3× larger than GPT3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to unlearn the toxic content seen at pretraining. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for large-scale models. Our code will be available at: https://github.com/NVIDIA/Megatron-LM/.

Author Information

Boxin Wang (Department of Computer Science, University of Illinois, Urbana Champaign)
Wei Ping (Nvidia)
Chaowei Xiao (ASU/NVIDIA)

I am Chaowei Xiao, a third year PhD student in CSE Department, University of Michigan, Ann Arbor. My advisor is Professor Mingyan Liu . I obtained my bachelor's degree in School of Software from Tsinghua University in 2015, advised by Professor Yunhao Liu, Professor Zheng Yang and Dr. Lei Yang. I was also a visiting student at UC Berkeley in 2018, advised by Professor Dawn Song and Professor Bo Li. My research interest includes adversarial machine learning.

Peng Xu (Nvidia)
Mostofa Patwary (NVIDIA)
Mohammad Shoeybi (NVIDIA)
Bo Li (UIUC)
Anima Anandkumar (NVIDIA / Caltech)
Bryan Catanzaro (NVIDIA)

More from the Same Authors