Keynote Speaker-Vision-Language-Action Models
Abstract
Vision-language-action (VLA) models enable multimodal language models to address challenges in robotics and control. While the basic foundations of VLAs are relatively simple, integrating action outputs into vision-language backbones, these models open up a wide range of new research topics. VLAs can use complex reasoning to solve temporally extended problems, refine their behavior through in-context learning, and utilize reinforcement learning to improve from experience. These new capabilities provide for exciting opportunities as well as challenges, which I will discuss in this talk.
Video
Chat is not available.
Successful Page Load