Poster
in
Workshop: AI That Keeps Up: Workshop on Continual and Compatible Foundation Model Updates (CCFM)

Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection

Mohammad Mahdi Moradi ⋅ Hossam Amer ⋅ Sudhir Mudur ⋅ Weiwei Zhang ⋅ Yang Liu ⋅ Walid Ahmed

Project Page [ OpenReview]

Abstract

Adapting pretrained LLMs to unlabeled, out-of-distribution data remains challenging, especially for structurally novel reasoning tasks. We present VDS-TTT (Verifier-Driven Sample Selection for Test-Time Training), a self-supervised framework that uses a learned verifier to score multiple generated responses and select only high-confidence pseudo-labeled examples for on-the-fly adaptation. For each query, the LLM generates N answers; the verifier picks the most reliable one above a confidence threshold, paired with its query for fine-tuning. We update only low-rank LoRA adapters, enabling efficient and fast adaptation. Across three benchmarks and three state-of-the-art LLMs, VDS-TTT achieves up to 32.29% relative improvement over the base model, showing its effectiveness for continuous test-time self-improvement.

Video

Chat is not available.