Poster
in
Workshop: Multimodal Algorithmic Reasoning Workshop Sun, Dec 7, 2025 • 4:05 PM – 5:00 PM PST

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

Qi Cao · Ruiyi Wang · Ruiyi Zhang · Pengtao Xie

Abstract

Extending process reward models (PRMs) to multimodal LLMs is hindered by broad domain coverage, train–test distribution shift, and severe dataset quality imbalance. We propose DreamPRM, a bi-level, domain-reweighted framework: lower-level fine-tuning learns with per-domain weights to prioritize high-quality reasoning signals, while upper-level evaluation on a meta set updates these weights via an aggregation loss. Across math and general multimodal benchmarks, test-time scaling with DreamPRM consistently boosts state-of-the-art MLLMs and outperforms competing data-selection and test-time-scaling methods.

Chat is not available.