NeurIPS 2023
Skip to yearly menu bar Skip to main content


Foundation Models for Decision Making

Sherry Yang · Ofir Nachum · Yilun Du · Stephen McAleer · Igor Mordatch · Linxi Fan · Jeannette Bohg · Dale Schuurmans

Hall E2 (level 1)
[ Abstract ] Workshop Website
Fri 15 Dec, 6:15 a.m. PST

Foundation models pretrained on diverse vision and language datasets have demonstrated exceptional capabilities in performing a wide range of downstream vision and language tasks. As foundation models are deployed in real-world applications such as dialogue, autonomous driving, healthcare, and robotics, they inevitably face new challenges such as learning from external feedback, adapting to different task modalities, and performing long-term reasoning and planning. Such challenges have traditionally been at the core of sequential decision making, encompassing areas such as reinforcement learning, imitation learning, planning, search, and optimal control. These research fields have traditionally focused on task-specific settings with limited prior knowledge, and yet there has been significant research progress in surpassing human performance in tasks like playing board games and Atari video games, as well as operating robots to complete navigation and manipulation tasks. However, since these methods generally learn to solve a specific task from scratch without broad knowledge from vision and language, they can struggle with generalization and sample efficiency. The goal of this workshop is to bring together the sequential decision making community including planning, search, RL, and optimal control, together with the foundation models community in vision and language to confront the challenges in decision making at scale. The workshop will span high-level discussions on how foundation models and decision making can benefit each other when jointly considered and low-level algorithmic details of various decision making algorithms and vision-language architectures, which might lead to both opportunities or challenges. Specific topics, for example, will include foundation model agents interacting with humans, computers, tools, simulators, physical world, and each other.

Chat is not available.
Timezone: America/Los_Angeles