Skip to yearly menu bar Skip to main content


Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning

Bingning Huang · Tu Nguyen · Matthieu Zimmer

Abstract

Chat is not available.