Competition

The MineRL BASALT Competition on Fine-tuning from Human Feedback

Anssi Kanervisto · Stephanie Milani · Karolis Ramanauskas · Byron Galbraith · Steven Wang · Brandon Houghton · Sharada Mohanty · Rohin Shah

Virtual
[ Abstract ] [ Website ]
Tue 6 Dec 3 a.m. PST — 6 a.m. PST

Abstract:

Given the impressive capabilities demonstrated by pre-trained foundation models, we must now grapple with how to harness these capabilities towards useful tasks. Since many such tasks are hard to specify programmatically, researchers have turned towards a different paradigm: fine-tuning from human feedback. The MineRL BASALT competition aims to spur research on this important class of techniques, in the domain of the popular video game Minecraft.The competition consists of a suite of four tasks with hard-to-specify reward functions.We define these tasks by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants train a separate agent for each task, using any method they want; we expect participants will choose to fine-tune the provided pre-trained models. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations of the four tasks, as well as an imitation learning baseline that leverages these demonstrations.We believe this competition will improve our ability to build AI systems that do what their designers intend them to do, even when intent cannot be easily formalized. This achievement will allow AI to solve more tasks, enable more effective regulation of AI systems, and make progress on the AI alignment problem.

Schedule