Timezone: »
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu · Daya Guo · Shuo Ren · Junjie Huang · Alexey Svyatkovskiy · Ambrosio Blanco · Colin Clement · Dawn Drain · Daxin Jiang · Duyu Tang · Ge Li · Lidong Zhou · Linjun Shou · Long Zhou · Michele Tufano · MING GONG · Ming Zhou · Nan Duan · Neel Sundaresan · Shao Kun Deng · Shengyu Fu · Shujie LIU
Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.
Author Information
Shuai Lu (Microsoft Research China)
Daya Guo (Sun Yat-Sen University)
Shuo Ren (Beihang University)
Junjie Huang (Beihang University)
Alexey Svyatkovskiy (Microsoft)
Ambrosio Blanco
Colin Clement (Microsoft)
Dawn Drain (Microsoft)
Daxin Jiang (Microsoft)
Duyu Tang (Microsoft Research)
Ge Li (Peking University)
Lidong Zhou (None)
Linjun Shou (Microsoft)
Long Zhou (Microsoft Research Asia)
Michele Tufano (Microsoft)
MING GONG (Microsoft)
Ming Zhou (Microsoft Research)
Nan Duan (Microsoft Research Asia)
Neel Sundaresan (Microsoft)
Shao Kun Deng (Microsoft)
Shengyu Fu
Shujie LIU (Microsoft)
More from the Same Authors
-
2021 Poster: Integrating Tree Path in Transformer for Code Representation »
Han Peng · Ge Li · Wenhan Wang · YunFei Zhao · Zhi Jin -
2021 Poster: Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation »
Yufei Wang · Can Xu · Huang Hu · Chongyang Tao · Stephen Wan · Mark Dras · Mark Johnson · Daxin Jiang -
2021 Poster: Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering »
Weijiang Yu · Haoteng Zheng · Mengfei Li · Lei Ji · Lijun Wu · Nong Xiao · Nan Duan -
2020 Poster: MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers »
Wenhui Wang · Furu Wei · Li Dong · Hangbo Bao · Nan Yang · Ming Zhou -
2019 Poster: Code Generation as a Dual Task of Code Summarization »
Bolin Wei · Ge Li · Xin Xia · Zhiyi Fu · Zhi Jin -
2019 Poster: A Tensorized Transformer for Language Modeling »
Xindian Ma · Peng Zhang · Shuai Zhang · Nan Duan · Yuexian Hou · Ming Zhou · Dawei Song -
2019 Poster: PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph »
Yikang LI · Tao Ma · Yeqi Bai · Nan Duan · Sining Wei · Xiaogang Wang -
2018 Poster: Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base »
Daya Guo · Duyu Tang · Nan Duan · Ming Zhou · Jian Yin