Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation
Abstract
Large reasoning models such as DeepSeek-R1 achieve strong performance on complex reasoning tasks but their size and computational demand limit practical use. Distilling their reasoning capabilities into smaller models via supervised fine-tuning offers a way to democratize reasoning ability, but resource constraints demand data-efficient training strategies. We propose a skill-centric distillation framework with two components: (1) skill-based data selection, which preferentially samples more examples for skills where the model shows lower proficiency from a large pool of expert reasoning traces, and (2) skill-aware fine-tuning, which trains models to explicitly articulate the sequence of skills they will apply before solving a problem, reinforcing skill composition and improving generalization. Operating within a budget of 1,000 training examples, our distillation framework consistently outperforms the standard baseline of fine-tuning with randomly sampled data. Our approach yields average absolute accuracy improvements of +1.6% with Qwen3-4B and +1.4% with Qwen3-8B across five mathematical reasoning benchmarks. Further analysis confirms that these gains are aligned with the emphasized skills, validating the efficacy of targeted training for data-efficient reasoning distillation.