Poster
in
Workshop: Workshop on Scaling Environments for Agents

Similar: A Step-Wise, Multi-Dimensional Reward Model for Virtual Agent Learning and Reasoning

Bingchen Miao ⋅ Yang Wu ⋅ Minghe Gao ⋅ Qifan Yu ⋅ Wendong Bu ⋅ Wenqiao Zhang ⋅ Yunfei Li ⋅ Siliang Tang ⋅ Tat-Seng Chua ⋅ Juncheng Li

2025 Poster
in
Workshop: Workshop on Scaling Environments for Agents

Project Page [ OpenReview]

Abstract

The development of Generalist Virtual Agents (GVAs) has shown significant promise in autonomous task execution. However, current training paradigms face critical limitations, including reliance on outcome supervision and labor-intensive human annotations. To address these challenges, we propose Similar, a step-wise multidimensional generalist reward model, which offers fine-grained signals for agent training and can choose better actions for inference-time scaling. Specifically, we begin by systematically defining five dimensions for evaluating agent actions. Building on this framework, we design an MCTS-P algorithm to automatically collect and annotate step-wise, five-dimensional agent execution data. Using this data, we train Similar with our crafted Triple-M strategy. Furthermore, we introduce the first benchmark in the virtual agent domain for step-wise, multi-dimensional reward model training and evaluation, named SRM. This benchmark consists of two components: SRMTrain, which serves as the training set for Similar, and SRMEval, a manually selected test set for evaluating the reward model. Experimental results demonstrate that Similar, through its step-wise, multi-dimensional assessment and synergistic gain, provides GVAs with effective intermediate signals during both training and inference-time scaling.

Chat is not available.