Timezone: »
We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of eight Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural LanguageInference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We create all of the datasets from scratch in a principled way. We design the tasks to have diverse formats and each task to be built upon various source corpora that respect copyrights. Also, we propose suitable evaluation metrics and organize annotation protocols in a way to ensure quality. To prevent ethical risks in KLUE, we proactively remove examples reflecting social biases, containing toxic content or personally identifiable information (PII). Along with the benchmark datasets, we release pre-trained language models (PLM) for Korean, KLUE-BERT and KLUE-RoBERTa, and find KLUE-Roberta-large outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs. The fine-tuning recipes are publicly open for anyone to reproduce our baseline result. We believe our work will facilitate future research on cross-lingual as well as Korean language models and the creation of similar resources for other languages. KLUE is available at https://klue-benchmark.com.
Author Information
Sungjoon Park (KAIST)
Jihyung Moon (Upstage)
* Natural Language Processing * Question and Answering
Sungdong Kim (NAVER)
Won Ik Cho (Seoul National University)
Ji Yoon Han (YONSEI UNIVERSITY)
Jangwon Park (Yonsei University)
Chisung Song
Junseong Kim (Scatter Lab)
Youngsook Song (Ubiquitous Computing Laboratory)
Taehwan Oh (Yonsei University)
Joohong Lee (Scatter Lab)
Juhyun Oh (Upstage AI Research)
Sungwon Lyu (Seoul National University)
Younghoon Jeong
Inkwon Lee (Sogang Univ.)
Sangwoo Seo
Dongjun Lee (LBox)
Hyunwoo Kim (Seoul National University)
Myeonghwa Lee (Korea Advanced Institute of Science and Technology)
Seongbo Jang (POSTECH)
Seungwon Do (ETRI)
Sunkyoung Kim (Korea Advanced Institute of Science and Technology)
Kyungtae Lim
Jongwon Lee (Samsung)
Kyumin Park (Korea Advanced Institute of Science and Technology)
Jamin Shin (Riiid AI Research)
Seonghyun Kim (Korea University)
Lucy Park (Upstage)
Alice Oh (KAIST)
Jung-Woo Ha (NAVER CLOVA AI Lab)
Kyunghyun Cho (Genentech | New York University)
More from the Same Authors
-
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 : Function-guided protein design by deep manifold sampling »
Vladimir Gligorijevic · Stephen Ra · Dan Berenberg · Richard Bonneau · Kyunghyun Cho -
2022 : Fine-tuning Diffusion Models with Limited Data »
Taehong Moon · Moonseok Choi · Gayoung Lee · Jung-Woo Ha · Juho Lee -
2022 Workshop: Robustness in Sequence Modeling »
Nathan Ng · Haoran Zhang · Vinith Suriyakumar · Chantal Shaib · Kyunghyun Cho · Yixuan Li · Alice Oh · Marzyeh Ghassemi -
2022 Poster: A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction »
Wonseok Hwang · Dongjun Lee · Kyoungyeon Cho · Hanuhl Lee · Minjoon Seo -
2022 Poster: On Divergence Measures for Bayesian Pseudocoresets »
Balhae Kim · Jungwon Choi · Seanie Lee · Yoonho Lee · Jung-Woo Ha · Juho Lee -
2022 : Invited talk (Dr Alice Oh) - " The importance of multiple languages and multiple cultures in NLP research" »
Alice Oh -
2021 : Function-guided protein design by deep manifold sampling »
Vladimir Gligorijevic · Stephen Ra · Dan Berenberg · Richard Bonneau · Kyunghyun Cho -
2021 : Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes »
Hyunwoo Kim · Byeongchang Kim · Gunhee Kim -
2021 Poster: True Few-Shot Learning with Language Models »
Ethan Perez · Douwe Kiela · Kyunghyun Cho -
2021 Poster: Emergent Communication under Varying Sizes and Connectivities »
Jooyeon Kim · Alice Oh -
2021 Poster: Metropolis-Hastings Data Augmentation for Graph Neural Networks »
Hyeonjin Park · Seunghun Lee · Sihyeon Kim · Jinyoung Park · Jisu Jeong · Kyung-Min Kim · Jung-Woo Ha · Hyunwoo Kim -
2021 Social: ML in Korea »
Jung-Woo Ha -
2020 Poster: Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs »
Dasol Hwang · Jinyoung Park · Sunyoung Kwon · KyungMin Kim · Jung-Woo Ha · Hyunwoo Kim -
2020 Social: NeurIPS 2020 Social ML in Korea »
Jung-Woo Ha -
2017 : Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation »
Alice Oh · Bernhard Schölkopf -
2017 : Posters and Coffee »
Jean-Baptiste Tristan · Yunseong Lee · Anna Veronika Dorogush · Shohei Hido · Michael Terry · Mennatullah Siam · Hidemoto Nakada · Cody Coleman · Jung-Woo Ha · Hao Zhang · Adam Stooke · Chen Meng · Christopher Kappler · Lane Schwartz · Christopher Olston · Sebastian Schelter · Minmin Sun · Daniel Kang · Waldemar Hummer · Jichan Chung · Tim Kraska · Kannan Ramchandran · Nick Hynes · Christoph Boden · Donghyun Kwak -
2017 Poster: Overcoming Catastrophic Forgetting by Incremental Moment Matching »
Sang-Woo Lee · Jin-Hwa Kim · Jaehyun Jun · Jung-Woo Ha · Byoung-Tak Zhang -
2017 Spotlight: Overcoming Catastrophic Forgetting by Incremental Moment Matching »
Sang-Woo Lee · Jin-Hwa Kim · Jaehyun Jun · Jung-Woo Ha · Byoung-Tak Zhang