Skip to yearly menu bar Skip to main content


Poster

HelpSteer 2: Open-source dataset for training top-performing reward models

Zhilin Wang · Yi Dong · Olivier Delalleau · Jiaqi Zeng · Gerald Shen · Daniel Egert · Jimmy Zhang · Makesh Narsimhan Sreedhar · Oleksii Kuchaiev


Abstract:

High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling.Methods that distil preference data from proprietary LLMs such as GPT-4 have restrictions on commercial usage imposed by model providers.To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4.0). Using a powerful internal base model trained on HelpSteer2, we are able to achieve the SOTA score (91.6%) on Reward-Bench's primary dataset, outperforming currently listed open and proprietary models, as of 5 June 2024. Notably, HelpSteer2 consists of only ten thousand response pairs, an order of magnitude fewer than existing preference datasets (e.g., HH-RLHF), which makes it highly efficient for training reward models. Our extensive experiments demonstrate that reward models trained with HelpSteer2 are effective in aligning LLMs. HelpSteer2 is available at https://huggingface.co/datasets/nvidia/HelpSteer2 and code is available at https://github.com/NVIDIA/NeMo-Aligner

Live content is unavailable. Log in and register to view live content