Value Matching: Scalable and Gradient-Free Reward-Guided Flow Adaptation
Abstract
Adapting large-scale flow and diffusion models to downstream applications is essential for making them practically useful, but current fine-tuning methods are increasingly impractical due to memory demands that scale with model size. This raises the question whether alignment with downstream rewards can be achieved with resource requirements that are independent of model complexity. We propose Value Matching (VM), a scalable method that sidesteps fine-tuning by training a lightweight value function to guide generation. VM learns the value function through Monte Carlo estimation, entirely decoupled from base model gradients, enabling efficient adaptation. Further, VM supports non-differentiable rewards and is more stable, sample-efficient, and expressive than classifier guidance. Experiments on image and molecular generation show that VM can successfully steer the pre-trained model density to high-reward regions while requiring only a small fraction of the memory used by current fine-tuning methods based on reinforcement learning or control theory techniques.