DA-CoTD: Efficient Chain-of-Thought Reasoning with Difficulty-Aware CoT-Distillation
Abstract
Chain-of-thought (CoT) prompting improves reasoning in large language models (LLMs) but often produces overly verbose traces, leading to inefficiency at inference time. This issue is amplified in multimodal reasoning, where simple problems require little reasoning while complex ones demand detailed cross-modal chains. We propose \textit{Difficulty-Aware CoT Distillation} (DA-CoTD), a framework that adapts reasoning length to input complexity. Using an LLM-based grader aligned with AoPS difficulty ratings, we compress verbose CoT traces into difficulty-aligned ones and fine-tune multimodal models via supervised fine-tuning (SFT) and direct preference optimization (DPO). Experiments on seven multimodal math benchmarks show that DA-CoTD reduces reasoning tokens by up to 30\% while maintaining or improving accuracy, outperforming strong baselines.