Timezone: »

Acting and Interacting in the Real World: Challenges in Robot Learning
Ingmar Posner · Raia Hadsell · Martin Riedmiller · Markus Wulfmeier · Rohan Paul

Fri Dec 08 08:00 AM -- 06:30 PM (PST) @ 104 B
Event URL: http://sites.google.com/view/nips17robotlearning/home »

In recent years robotics has made significant strides towards applications of real value to the public domain. Robots are now increasingly expected to work for and alongside us in complex, dynamic environments. Machine learning has been a key enabler of this success, particularly in the realm of robot perception where, due to substantial overlap with the machine vision community, methods and training data can be readily leveraged.

Recent advances in reinforcement learning and learning from demonstration — geared towards teaching agents how to act — provide a tantalising glimpse at a promising future trajectory for robot learning. Mastery of challenges such as the Atari suite and AlphaGo build significant excitement as to what our robots may be able to do for us in the future. However, this success relies on the ability of learning cheaply, often within the confines of a virtual environment, by trial and error over as many episodes as required. This presents a significant challenge for embodied systems acting and interacting in the real world. Not only is there a cost (either monetary or in terms of execution time) associated with a particular trial, thus limiting the amount of training data obtainable, but there also exist safety constraints which make an exploration of the state space simply unrealistic: teaching a real robot to cross a real road via reinforcement learning for now seems a noble yet somewhat far fetched goal. A significant gulf therefore exists between prior art on teaching agents to act and effective approaches to real-world robot learning. This, we posit, is one of the principal impediments at the moment in advancing real-world robotics science.

In order to bridge this gap researchers and practitioners in robot learning have to address a number of key challenges to allow real-world systems to be trained in a safe and data-efficient manner. This workshop aims to bring together experts in reinforcement learning, learning from demonstration, deep learning, field robotics and beyond to discuss what the principal challenges are and how they might be addressed. With a particular emphasis on data efficient learning, of particular interest will be contributions in representation learning, curriculum learning, task transfer, one-shot learning, domain transfer (in particular from simulation to real-world tasks), reinforcement learning for real world applications, learning from demonstration for real world applications, knowledge learning from observation and interaction, active concept acquisition and learning causal models.

Fri 8:50 a.m. - 9:00 a.m. [iCal]
Welcome (Introduction)
Fri 9:00 a.m. - 9:30 a.m. [iCal]

Reinforcement learning and imitation learning have seen success in many domains, including autonomous helicopter flight, Atari, simulated locomotion, Go, robotic manipulation. However, sample complexity of these methods remains very high. In this talk I will present several ideas towards reducing sample complexity: (i) Hindsight Experience Replay, which infuses learning signal into (traditionally) zero-reward runs, and is compatible with existing off-policy algorithms; (ii) Some recent advances in Model-based Reinforcement Learning, which achieve 100x sample complexity gain over the more widely studied model-free methods; (iii) Meta-Reinforcement Learning, which can significantly reduce sample complexity by building off other skills acquired in the past; (iv) Domain Randomization, a simple idea that can often enable training fully in simulation, yet still recover policies that perform well in the real world.

Fri 9:30 a.m. - 10:00 a.m. [iCal]

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints. We study how these representations can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a triplet loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitate

Pierre Sermanet
Fri 11:30 a.m. - 12:00 p.m. [iCal]

Raquel Urtasun is the Head of Uber ATG Toronto. She is also an Associate Professor in the Department of Computer Science at the University of Toronto, a Raquel Urtasun is the Head of Uber ATG Toronto. She is also an Associate Professor in the Department of Computer Science at the University of Toronto, a Canada Research Chair in Machine Learning and Computer Vision and a co-founder of the Vector Institute for AI. Prior to this, she was an Assistant Professor at the Toyota Technological Institute at Chicago (TTIC), an academic computer science institute affiliated with the University of Chicago. She was also a visiting professor at ETH Zurich during the spring semester of 2010. She received her Bachelors degree from Universidad Publica de Navarra in 2000, her Ph.D. degree from the Computer Science department at Ecole Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. She is a world leading expert in machine perception for self-driving cars. Her research interests include machine learning, computer vision, robotics and remote sensing. Her lab was selected as an NVIDIA NVAIL lab. She is a recipient of an NSERC EWR Steacie Award, an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught New Researcher Award and two Best Paper Runner up Prize awarded at the Conference on Computer Vision and Pattern Recognition (CVPR) in 2013 and 2017 respectively. She is also an Editor of the International Journal in Computer Vision (IJCV) and has served as Area Chair of multiple machine learning and vision conferences (i.e., NIPS, UAI, ICML, ICLR, CVPR, ECCV).

Raquel Urtasun
Fri 2:00 p.m. - 2:30 p.m. [iCal]

I will describe recent results from my group on visually guided manipulation and navigation. We are guided considerably by insights from human development and cognition. In manipulation, our work is based on object-oriented task models acquired by experimentation. In navigation, we show the benefits of architectures based on cognitive maps and landmarks.

Jitendra Malik
Fri 2:30 p.m. - 3:00 p.m. [iCal]

A key limitation, in particular for computer vision tasks, is their reliance on vast amounts of strongly supervised data. This limits scalability, prevents rapid acquisition of new concepts, and limits adaptability to new tasks or new conditions. To address this limitation, I will explore ideas in learning visual models from limited data. The basic insight behind all of these ideas is that it is possible to learn from a large corpus of vision tasks how to learn models for new tasks with limited data, by representing the way visual models vary across tasks, also called model dynamics. The talk will also show examples from common visual classification tasks.

Martial Hebert
Fri 3:00 p.m. - 4:00 p.m. [iCal]

Spotlights: Deep Object-Centric Representations for Generalizable Robot Learning < Coline Devin>

Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

Learning Deep Composable Maximum-Entropy Policies for Real-World Robotic Manipulation

SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Control

Learning Flexible and Reusable Locomotion Primitives for a Microrobot

Policy Search using Robust Bayesian Optimization

Learning Robotic Assembly from CAD

Learning Robot Skill Embeddings

Self-Supervised Visual Planning with Temporal Skip Connections

Overcoming Exploration in Reinforcement Learning with Demonstrations

Deep Reinforcement Learning for Vision-Based Robotic Grasping

Soft Value Iteration Networks for Planetary Rover Path Planning

Posters: One-Shot Visual Imitation Learning via Meta-Learning

One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay < Jake Bruce; Niko Suenderhauf; Piotr Mirowski; Raia Hadsell; Michael Milford >

Bayesian Active Edge Evaluation on Expensive Graphs < Sanjiban Choudhury >

Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand Observations and Continuous Control < Mengyuan Yan; Iuri Frosio*; Stephen Tyree; Kautz Jan >

Learning Robotic Manipulation of Granular Media < Connor Schenck*; Jonathan Tompson; Dieter Fox; Sergey Levin>

End-to-End Learning of Semantic Grasping < Eric Jang >

Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Efficient Robot Task Learning and Transfer via Informed Search in Movement Parameter Space < Nemanja Rakicevic*; Kormushev Petar >

Metrics for Deep Generative Models based on Learned Skills

Unsupervised Hierarchical Video Prediction < Nevan wichers*; Dumitru Erhan; Honglak Lee >

Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Domain Randomization and Generative Models for Robotic Grasping

Learning to Grasp from Vision and Touch

Neural Network Dynamics Models for Control of Under-actuated Legged Millirobots

On the Importance of Uncertainty for Control with Deep Dynamics Models

Increasing Sample-Efficiency via Online Meta-Learning

Stochastic Variational Video Prediction

(Author information copied from CMT please contact the workshop organisers under nips17robotlearning@gmail.com for any changes)

Jake Bruce, Deirdre Quillen, Nemanja Rakicevic, Kurtland Chua, Connor Schenck, Melissa Chien, Mohammad Babaeizadeh, Nevan Wichers, mengyuan yan, Paul Wohlhart, Julian Ibarz, Kurt Konolige

Author Information

Ingmar Posner (Oxford University)
Raia Hadsell (DeepMind)
Martin Riedmiller (DeepMind)
Markus Wulfmeier (University of Oxford)
Rohan Paul (MIT)

Rohan Paul (MIT) is a Postdoctoral Associate at the Robust Robotics Group at CSAIL, MIT. His doctoral research contribute to appearance-based topological mapping and life-long learning. His recent work focuses on grounding natural language instructions and concept acquisition from language and vision; aimed towards capable service robots that interact seamlessly with humans. His work has received Best Paper awards/nominations at leading robotics conferences: RSS ’16, IROS ’13 and ICRA ’10. Google scholar: https://scholar.google.com/citations?user=Cgp-L2UAAAAJ&hl=en

More from the Same Authors