Timezone: »
Multi-lingual fine-tuning (MLF), which fine-tunes a multi-lingual language model (MLLM) with multiple source languages, aims to gain good zero-shot performance on target languages. In MLF, the fine-tuned model tends to fit the source languages while forgetting its cross-lingual knowledge obtained from the pre-training stage. This forgetting phenomenon degenerates the zero-shot performance of MLF, which remains under-explored. To fill this gap, this paper proposes a multi-lingual fine-tuning method, dubbed Less-forgetting Multi-lingual Fine-tuning (LF-MLF). In LF-MLF, we cast multi-lingual fine-tuning as a constrained optimization problem, where the optimization objective is to minimize forgetting, and constraints are reducing the fine-tuning loss. The proposed method has superior zero-shot performance; furthermore, it can achieve the Pareto stationarity. Extensive experiments on Named Entity Recognition, Question Answering and Natural Language Inference back up our theoretical analysis and validate the superiority of our proposals.
Author Information
Yuren Mao (Zhejiang University)
Yuren Mao received his PhD degree in computer science from University of New South Wales, Australia in 2022. He is currently an assistant professor with the School of Software Technology, Zhejiang University, China. His current research interests include Multi-task Learning and its applications. His research results have been published at leading conferences such as ICML, NeurIPS, ACL, TKDE and so on.
Yaobo Liang (Microsoft)
Nan Duan (Microsoft Research Asia)
Haobo Wang (Zhejiang University)
Kai Wang (University of New South Wales)
Lu Chen (Zhejiang University)
Yunjun Gao (Zhejiang University)
More from the Same Authors
-
2021 : CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation »
Shuai Lu · Daya Guo · Shuo Ren · Junjie Huang · Alexey Svyatkovskiy · Ambrosio Blanco · Colin Clement · Dawn Drain · Daxin Jiang · Duyu Tang · Ge Li · Lidong Zhou · Linjun Shou · Long Zhou · Michele Tufano · MING GONG · Ming Zhou · Nan Duan · Neel Sundaresan · Shao Kun Deng · Shengyu Fu · Shujie LIU -
2022 Poster: SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning »
Haobo Wang · Mingxuan Xia · Yixuan Li · Yuren Mao · Lei Feng · Gang Chen · Junbo Zhao -
2022 Poster: NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis »
Jian Liang · Chenfei Wu · Xiaowei Hu · Zhe Gan · Jianfeng Wang · Lijuan Wang · Zicheng Liu · Yuejian Fang · Nan Duan -
2022 Poster: LogiGAN: Learning Logical Reasoning via Adversarial Pre-training »
Xinyu Pi · Wanjun Zhong · Yan Gao · Nan Duan · Jian-Guang Lou -
2021 Poster: Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering »
Weijiang Yu · Haoteng Zheng · Mengfei Li · Lei Ji · Lijun Wu · Nong Xiao · Nan Duan -
2019 Poster: A Tensorized Transformer for Language Modeling »
Xindian Ma · Peng Zhang · Shuai Zhang · Nan Duan · Yuexian Hou · Ming Zhou · Dawei Song -
2019 Poster: PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph »
Yikang LI · Tao Ma · Yeqi Bai · Nan Duan · Sining Wei · Xiaogang Wang -
2018 Poster: Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base »
Daya Guo · Duyu Tang · Nan Duan · Ming Zhou · Jian Yin