Timezone: »
Poster
Incorporating BERT into Parallel Sequence Decoding with Adapters
Junliang Guo · Zhirui Zhang · Linli Xu · Hao-Ran Wei · Boxing Chen · Enhong Chen
While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem. In this paper, we propose to address this problem by taking two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset. In this way, we obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models, while bypassing the catastrophic forgetting problem. Each component in the framework can be considered as a plug-in unit, making the framework flexible and task agnostic.
Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT, and can be adapted to traditional autoregressive decoding easily.
We conduct extensive experiments on neural machine translation tasks where
the proposed method consistently outperforms autoregressive baselines while reducing the inference latency by half,
and achieves $36.49$/$33.57$ BLEU scores on IWSLT14 German-English/WMT14 German-English translation.
When adapted to autoregressive decoding, the proposed method achieves $30.60$/$43.56$ BLEU scores on WMT14 English-German/English-French translation,
on par with the state-of-the-art baseline models.
Author Information
Junliang Guo (University of Science and Technology of China)
Zhirui Zhang (Alibaba Group Inc.)
Linli Xu (University of Science and Technology China)
Hao-Ran Wei (Alibaba DAMO Academy)
Boxing Chen (Alibaba Group)
Enhong Chen (University of Science and Technology of China)
More from the Same Authors
-
2022 Poster: DARE: Disentanglement-Augmented Rationale Extraction »
Linan Yue · Qi Liu · Yichao Du · Yanqing An · Li Wang · Enhong Chen -
2022 Spotlight: Lightning Talks 5B-4 »
Yuezhi Yang · Zeyu Yang · Yong Lin · Yi.shi Xu · Linan Yue · Tao Yang · Weixin Chen · Qi Liu · Jiaqi Chen · Dongsheng Wang · Baoyuan Wu · Yuwang Wang · Hao Pan · Shengyu Zhu · Zhenwei Miao · Yan Lu · Lu Tan · Bo Chen · Yichao Du · Haoqian Wang · Wei Li · Yanqing An · Ruiying Lu · Peng Cui · Nanning Zheng · Li Wang · Zhibin Duan · Xiatian Zhu · Mingyuan Zhou · Enhong Chen · Li Zhang -
2022 Spotlight: DARE: Disentanglement-Augmented Rationale Extraction »
Linan Yue · Qi Liu · Yichao Du · Yanqing An · Li Wang · Enhong Chen -
2022 Spotlight: Lightning Talks 4B-4 »
Ziyue Jiang · Zeeshan Khan · Yuxiang Yang · Chenze Shao · Yichong Leng · Zehao Yu · Wenguan Wang · Xian Liu · Zehua Chen · Yang Feng · Qianyi Wu · James Liang · C.V. Jawahar · Junjie Yang · Zhe Su · Songyou Peng · Yufei Xu · Junliang Guo · Michael Niemeyer · Hang Zhou · Zhou Zhao · Makarand Tapaswi · Dongfang Liu · Qian Yang · Torsten Sattler · Yuanqi Du · Haohe Liu · Jing Zhang · Andreas Geiger · Yi Ren · Long Lan · Jiawei Chen · Wayne Wu · Dahua Lin · Dacheng Tao · Xu Tan · Jinglin Liu · Ziwei Liu · 振辉 叶 · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis »
Yichong Leng · Zehua Chen · Junliang Guo · Haohe Liu · Jiawei Chen · Xu Tan · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 : Interactive Industrial Panel »
Jiahao Sun · Ahmed Ibrahim · Marjan Ghazvininejad · Yu Cheng · Boxing Chen · Mohammad Norouzi · Rahul Gupta -
2022 Poster: BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis »
Yichong Leng · Zehua Chen · Junliang Guo · Haohe Liu · Jiawei Chen · Xu Tan · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Poster: Graph Convolution Network based Recommender Systems: Learning Guarantee and Item Mixture Powered Strategy »
Leyan Deng · Defu Lian · Chenwang Wu · Enhong Chen -
2022 Poster: Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever »
Jin Chen · Defu Lian · Yucheng Li · Baoyun Wang · Kai Zheng · Enhong Chen -
2022 Poster: Recommender Forest for Efficient Retrieval »
Chao Feng · Wuchao Li · Defu Lian · Zheng Liu · Enhong Chen -
2021 : Efficient Multi-lingual Neural Machine Translation »
Boxing Chen -
2020 Poster: Semi-Supervised Neural Architecture Search »
Renqian Luo · Xu Tan · Rui Wang · Tao Qin · Enhong Chen · Tie-Yan Liu -
2020 Poster: Sampling-Decomposable Generative Adversarial Recommender »
Binbin Jin · Defu Lian · Zheng Liu · Qi Liu · Jianhui Ma · Xing Xie · Enhong Chen -
2019 Poster: Efficient Pure Exploration in Adaptive Round Model »
Tianyuan Jin · Jieming SHI · Xiaokui Xiao · Enhong Chen -
2018 Poster: Neural Architecture Optimization »
Renqian Luo · Fei Tian · Tao Qin · Enhong Chen · Tie-Yan Liu -
2012 Poster: Image Denoising and Inpainting with Deep Neural Networks »
Junyuan Xie · Linli Xu · Enhong Chen