Poster
in
Workshop: The First Workshop on Efficient Reasoning

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Anisha Gunjal ⋅ Anthony Wang ⋅ Elaine Lau ⋅ Vaskar Nath ⋅ Yunzhong He ⋅ Bing Liu ⋅ Sean Hendryx

Project Page [ OpenReview]

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for complex reasoning tasks with clear correctness signals such as math and coding. However, extending it to real-world reasoning tasks is challenging, as evaluation depends on nuanced, multi-criteria judgments rather than binary correctness. Instance-specific rubrics have recently been used in evaluation benchmarks to capture such judgments, but their potential as reward signals for on-policy post-training remains underexplored. We introduce $\textbf{Rubrics as Rewards (\textit{RaR})}$, an on-policy reinforcement learning method that extends RLVR beyond verifiable domains by using rubric-based feedback. Across both medical and science domains, we evaluate multiple strategies for aggregating rubric feedback into rewards. The best RaR variant achieves relative improvements of up to 31\% on HealthBench and 7\% on GPQA-Diamond over popular LLM-as-judge baselines that rely on direct Likert-based rewards. These results demonstrate that RaR-trained policies adapt well to diverse evaluation formats, performing strongly on both rubric-based and multiple-choice tasks. Moreover, we find that using rubrics as structured reward signals yields better alignment for smaller judges and reduces performance variance across judge scales.

Chat is not available.