Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Oracle Ranking Feedback
Derek Shi
Abstract
Full-paper Derek Shi - Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Oracle Ranking Feedback
Successful Page Load