Timezone: »

 
KLUE: Korean Language Understanding Evaluation
Sungjoon Park · Jihyung Moon · Sungdong Kim · Won Ik Cho · Ji Yoon Han · Jangwon Park · Chisung Song · Junseong Kim · Youngsook Song · Taehwan Oh · Joohong Lee · Juhyun Oh · Sungwon Lyu · Younghoon Jeong · Inkwon Lee · Sangwoo Seo · Dongjun Lee · Hyunwoo Kim · Myeonghwa Lee · Seongbo Jang · Seungwon Do · Sunkyoung Kim · Kyungtae Lim · Jongwon Lee · Kyumin Park · Jamin Shin · Seonghyun Kim · Lucy Park · Alice Oh · Jung-Woo Ha · Kyunghyun Cho

We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of eight Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural LanguageInference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We create all of the datasets from scratch in a principled way. We design the tasks to have diverse formats and each task to be built upon various source corpora that respect copyrights. Also, we propose suitable evaluation metrics and organize annotation protocols in a way to ensure quality. To prevent ethical risks in KLUE, we proactively remove examples reflecting social biases, containing toxic content or personally identifiable information (PII). Along with the benchmark datasets, we release pre-trained language models (PLM) for Korean, KLUE-BERT and KLUE-RoBERTa, and find KLUE-Roberta-large outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs. The fine-tuning recipes are publicly open for anyone to reproduce our baseline result. We believe our work will facilitate future research on cross-lingual as well as Korean language models and the creation of similar resources for other languages. KLUE is available at https://klue-benchmark.com.

Author Information

Sungjoon Park (KAIST)
Jihyung Moon (Upstage)

* Natural Language Processing * Question and Answering

Sungdong Kim (NAVER)
Won Ik Cho (Seoul National University)
Ji Yoon Han (YONSEI UNIVERSITY)
Jangwon Park (Yonsei University)
Chisung Song
Junseong Kim (Scatter Lab)
Youngsook Song (Ubiquitous Computing Laboratory)
Taehwan Oh (Yonsei University)
Joohong Lee (Scatter Lab)
Juhyun Oh (Upstage AI Research)
Sungwon Lyu (Seoul National University)
Younghoon Jeong
Inkwon Lee (Sogang Univ.)
Sangwoo Seo
Dongjun Lee (LBox)
Hyunwoo Kim (Seoul National University)
Myeonghwa Lee (Korea Advanced Institute of Science and Technology)
Seongbo Jang (POSTECH)
Seungwon Do (ETRI)
Sunkyoung Kim (Korea Advanced Institute of Science and Technology)
Kyungtae Lim
Jongwon Lee (Samsung)
Kyumin Park (Korea Advanced Institute of Science and Technology)
Jamin Shin (Riiid AI Research)
Seonghyun Kim (Korea University)
Lucy Park (Upstage)
Alice Oh (KAIST)
Alice Oh

I am a professor at KAIST in the School of Computing with joint appointment in the Graduate School of AI. My research interests are in developing and applying machine learning models for natural language processing. In our research group, we look at various data such as news, social media, Wikipedia, and programming education.

Jung-Woo Ha (NAVER CLOVA AI Lab)
Jung-Woo Ha

- Head, AI Innovation, NAVER Cloud - Research Fellow, NAVER AI Lab - Datasets and Benchmarks Co-Chair, NeurIPS 2023 - Socials Co-Chair, ICML 2023 - Socials Co-Chair, NeurIPS 2022 - BS, Seoul National University - PhD, Seoul National University

Kyunghyun Cho (Genentech | New York University)

More from the Same Authors