Enhancing Diversity in Large Language Models via Determinantal Point Processes
Abstract
Supervised fine-tuning and reinforcement learning, while improving large language model (LLM) quality, often reduce output diversity, leading to narrow, canonical responses. Existing methods to enhance diversity are limited, either by operating at inference time or by focusing on lexical differences. We propose a novel training method based on determinantal point processes (DPPs) to jointly optimize LLMs for quality and semantic diversity. Our approach samples and embeds responses, then uses the determinant of a kernel-based similarity matrix to measure diversity as the volume spanned by the embeddings. Experiments across instruction-following, story generation, and reasoning tasks demonstrate that our method substantially improves semantic diversity without sacrificing model quality.