Skip to yearly menu bar Skip to main content

Workshop: Synthetic Data Generation with Generative AI

Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models

Yujin Kim · Jaehong Yoon · Seonghyeon Ye · Sung Ju Hwang · Se-Young Yun

Keywords: [ language model ] [ continual learning ] [ Natural Language Processing ] [ question answering ]


In an ever-evolving world, the dynamic nature of knowledge presents challenges for language models that are trained on static data, leading to outdated encoded information. However, real-world scenarios require models not only to acquire new knowledge but also to overwrite outdated information into updated ones. Addressing this, we introduce the temporally evolving question answering benchmark, EvolvingQA - a novel benchmark designed for training and evaluating LMs on an evolving Wikipedia database, where the construction of our benchmark is automated with our pipeline using large language models. Our benchmark incorporates question-answering as a downstream task to emulate real-world applications. Through EvolvingQA, we uncover that existing continual learning baselines have difficulty in updating and forgetting outdated knowledge. Our findings suggest that the models fail to learn properly when acquiring updated knowledge due to the small weight gradient. Furthermore, we elucidate that the models struggle mostly on providing numerical or temporal answers to questions asking for updated knowledge. Our work aims to model the dynamic nature of real-world information, offering a robust measure for the evolution-adaptability of language models. Our data construction code and dataset files are available at

Chat is not available.