Workshop: 5th Robot Learning Workshop: Trustworthy Robotics

DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

Ivan Kapelyukh · Vitalis Vosylius · Edward Johns


We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a human-like arrangement of those objects, and finally physically arranging the objects according to that image. Our implementation achieves this zero-shot using DALL-E, without any further data collection or training. Strong real-world results with human studies show that this is an exciting direction for future generations of robot learning algorithms. We propose a list of recommendations to the community for further developments in this direction. Videos:

