Applying machine learning to real-world systems such as robots has been an important part of the NeurIPS community in past years. Progress in machine learning has enabled robots to demonstrate strong performance in helping humans in some household and care-taking tasks, manufacturing, logistics, transportation, and many other unstructured and human-centric environments. While these results are promising, access to high-quality, task-relevant data remains one of the largest bottlenecks for successful deployment of such technologies in the real world.
Methods to generate, re-use, and integrate more sources of valuable data, such as lifelong learning, transfer, and continuous improvement could unlock the next steps of performance. However, accessing these data sources comes with fundamental challenges, which include safety, stability, and the daunting issue of providing supervision for learning while the robot is in operation. Today, unique new opportunities are presenting themselves in this quest for robust, continuous learning: large-scale, self-supervised and multimodal approaches to learning are showing and often exceeding state-of-the-art supervised learning approaches; reinforcement and imitation learning are becoming more stable and data-efficient in real-world settings; new approaches combining strong, principled safety and stability guarantees with the expressive power of machine learning are emerging.
This workshop aims to discuss how these emerging trends in machine learning of self-supervision and lifelong learning can be best utilized in real-world robotic systems. We bring together experts with diverse perspectives on this topic to highlight the ways current successes in the field are changing the conversation around lifelong learning, and how this will affect the future of robotics, machine learning, and our ability to deploy intelligent, self-improving agents to enhance people's lives.
Our speaker talks have been prerecorded and are available on YouTube. The talks will NOT be replayed during the workshop. We encourage all participants to watch them ahead of time to make the panel discussions with the speakers more engaging and insightful.
More information can be found on the website: http://www.robot-learning.ml/2021/.
Tue 7:00 a.m. - 7:15 a.m.
|
Opening Remarks
(
Introduction
)
SlidesLive Video » |
🔗 |
Tue 7:15 a.m. - 7:30 a.m.
|
Continual Learning of Semantic Segmentation using Complementary 2D-3D Data Representations
(
Contributed Talk 1: Best Paper Runner-Up
)
SlidesLive Video » Semantic segmentation networks are usually pre-trained and not updated during deployment. As a consequence, misclassifications commonly occur if the distribution of the training data deviates from the one encountered during the robot's operation. We propose to mitigate this problem by adapting the neural network to the robot's environment during deployment, without any need for external supervision. Leveraging complementary data representations, we generate a supervision signal, by probabilistically accumulating consecutive 2D semantic predictions in a volumetric 3D map. We then retrain the network on renderings of the accumulated semantic map, effectively resolving ambiguities and enforcing multi-view consistency through the 3D representation. To preserve the previously-learned knowledge while performing network adaptation, we employ a continual learning strategy based on experience replay. Through extensive experimental evaluation, we show successful adaptation to real-world indoor scenes both on the ScanNet dataset and on in-house data recorded with an RGB-D sensor. Our method increases the segmentation performance on average by 11.8% compared to the fixed pre-trained neural network, while effectively retaining knowledge from the pre-training dataset. |
Jonas Frey · Hermann Blum · Francesco Milano · Roland Siegwart · Cesar Cadena 🔗 |
Tue 7:30 a.m. - 8:15 a.m.
|
Learning from and Interacting with Humans (Q&A 1)
(
Panel
)
link »
SlidesLive Video » In Zoom webinar, password lightbig |
🔗 |
Tue 8:15 a.m. - 8:45 a.m.
|
Coffee Break
|
🔗 |
Tue 8:45 a.m. - 9:45 a.m.
|
Poster Session 1 ( Poster Session ) link » | 🔗 |
Tue 9:45 a.m. - 10:30 a.m.
|
Domains and Applications (Q&A 2)
(
Panel
)
link »
SlidesLive Video » In Zoom webinar, password lightbig |
🔗 |
Tue 10:30 a.m. - 3:15 p.m.
|
Long Break
|
🔗 |
Tue 3:15 p.m. - 4:15 p.m.
|
End2End or Modular Systems (Q&A 3)
(
Panel
)
link »
SlidesLive Video » In Zoom webinar, password lightbig |
🔗 |
Tue 4:15 p.m. - 4:30 p.m.
|
Lifelong Robotic Reinforcement Learning by Retaining Experiences
(
Contributed Talk 2: Best Paper
)
SlidesLive Video » Multi-task learning ideally allows robots to acquire a diverse repertoire of useful skills. However, many multi-task reinforcement learning efforts assume the robot can collect data from \emph{all} tasks at \emph{all} times. In reality, the tasks that the robot learns arrive sequentially, depending on the user and the robot's current environment. In this work, we study a practical sequential multi-task RL problem that is motivated by the practical constraints of physical robotic systems, and derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set. In a series of simulated robotic manipulation experiments, our approach requires less than half the samples than learning each task from scratch, while avoiding impractical round-robin data collection. On a Franka Emika Panda robot arm, our approach incrementally learns ten challenging tasks, including bottle capping and block insertion. |
Annie Xie · Chelsea Finn 🔗 |
Tue 4:30 p.m. - 5:00 p.m.
|
Break
|
🔗 |
Tue 5:00 p.m. - 6:00 p.m.
|
Poster Session 2 ( Poster Session ) link » | 🔗 |
Tue 6:00 p.m. - 6:55 p.m.
|
Self- and Unsupervised Learning (Debate)
(
Panel
)
link »
SlidesLive Video » In Zoom webinar, password lightbig |
🔗 |
Tue 6:55 p.m. - 7:00 p.m.
|
Concluding Remarks
(
Wrap Up
)
|
🔗 |
-
|
Solving Occlusion in Terrain Mapping using Neural Networks
(
Poster
)
Accurate and complete terrain maps enhance the awareness of autonomous robots and enable safe and optimal path planning. Rocks and topography often create occlusions and lead to missing elevation information in Digital Elevation Maps (DEMs). Currently, mostly traditional inpainting techniques based on diffusion or patch-matching are used by autonomous mobile robots to fill-in incomplete DEMs. These methods cannot leverage the high-level terrain characteristics and the geometric constraints of line of sight we humans use intuitively to predict occluded areas. We propose to use neural networks to reconstruct the occluded areas in DEMs. We introduce a self-supervised learning approach capable of training on real-world data without a need for ground-truth information. We accomplish this by adding artificial occlusion to the incomplete elevation maps constructed on a real robot by performing ray casting. We evaluate our self-supervised learning approach on several real-world datasets which were recorded during autonomous exploration of both structured and unstructured terrain with a legged robot, and additionally in a planetary scenario on Lunar analogue terrain. We state a significant improvement compared to the Telea and Navier-Stokes baseline. |
Maximilian Stölzle · Martin Azkarate · Marco Hutter 🔗 |
-
|
Assistive Tele-op: Leveraging Transformers to Collect Robotic Task Demonstrations
(
Poster
)
SlidesLive Video » Sharing communication of autonomous robots with input from a human operator could facilitate data collection of robotic task demonstrations to continuously improve learned models. Yet, the means to communicate intent and reason about the future are disparate between humans and robots. Recent advancements in NLP with Transformers lend both insight and specific tools to tackle this. The self-attention mechanism in Transformers aims to holistically understand a sequence of words, rather than emphasizing adjacent connections. The same holds when Transformers are applied to robotic task trajectories: given an environment state and task goal, the model can quickly update its plan with new information at every step while maintaining holistic knowledge of the past. A key insight is that human intent can be injected at any location within the time sequence if the user decides that the model predicted actions are inappropriate. At every time step, the user can (1) do nothing and allow autonomous operation to continue while observing the robot’s future plan sequence, or (2) take over and momentarily prescribe a different set of actions to nudge the model back on track and let it continue autonomously from there onwards. Virtual reality (VR) offers an ideal ground to communicate these intents on a robot, and to accumulate knowledge from human demonstrations. We develop Assistive Tele-op, a VR system that allows users to collect robot task demonstrations with both a high success rate and with greater ease than manual teleoperation systems. |
Henry Clever · Ankur Handa · Hammad Mazhar · Qian Wan · Yashraj Narang · Maya Cakmak · Dieter Fox 🔗 |
-
|
Sample-Efficient Policy Search with a Trajectory Autoencoder
(
Poster
)
SlidesLive Video » We introduce a trajectory generator that can be used to perform sample-efficient policy search with Bayesian optimization (BO). BO is a sample-efficient approach to direct policy search that usually does not scale well with the number of parameters. Our trajectory generator is able to map a compact representation of trajectories to a high-dimensional trajectory space so that BO can search in the low-dimensional space. The trajectory generator will be trained as part of a variational autoencoder on demonstrations from an expert. The trajectory generator contains a trajectory layer, which is a new building block for neural networks that enforces smoothness on generated trajectories. We evaluate our approach with grasping on a real robot. |
Alexander Fabisch · Frank Kirchner 🔗 |
-
|
Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets
(
Poster
)
SlidesLive Video » Can a robot autonomously learn to design and construct a bridge from varying-sized blocks without a blueprint? It is a challenging task with long horizon and sparse reward -- the robot has to figure out physically stable design schemes and feasible actions to manipulate and transport blocks. Due to diverse block sizes, the state space and action trajectories are vast to explore. In this paper, we propose a hierarchical approach for this problem. It consists of a reinforcement-learning designer to propose high-level building instructions and a motion-planning-based action generator to manipulate blocks at the low level. For high-level learning, we develop a novel technique, prioritized memory resetting (PMR) to improve exploration. PMR adaptively resets the state to those most critical configurations from a replay buffer so that the robot can resume training on partial architectures instead of from scratch. Furthermore, we augment PMR with auxiliary training objectives and fine-tune the designer with the locomotion generator. Our experiments in simulation and on a real deployed robotic system demonstrate that it is able to effectively construct bridges with blocks of varying sizes at a high success rate. Demos can be found at https://sites.google.com/view/bridge-pmr. |
Yunfei Li · Lei Li · YI WU 🔗 |
-
|
Visual Affordance-guided Policy Optimization
(
Poster
)
SlidesLive Video » Robots operating in human-centered environments need the ability to understand how objects function: what can be done with each object, where this interaction may occur, and how the object is used to achieve a goal. To this end, we propose a novel approach that extracts a self-supervised visual affordance model from human teleoperated play data and leverages it to enable efficient policy learning and motion planning. We combine model-based planning with model-free deep reinforcement learning (RL) to learn grasping policies that favor the same object regions favored by people, while requiring minimal interactions with the environment. We evaluate our algorithm, Visual Affordance-guided Policy Optimization (VAPO), with both diverse simulation manipulation tasks and real world robot tidy-up experiments to demonstrate the effectiveness of our affordance-guided policies. |
Oier Mees · Jessica Borja · Gabriel Kalweit · Lukas Hermann · Joschka Boedecker · Wolfram Burgard 🔗 |
-
|
panda-gym : Open-source goal-conditioned enviroments for robotic learning
(
Poster
)
SlidesLive Video » This paper presents panda-gym, a set of Reinforcement Learning (RL) environments for the Franka Emika Panda robot integrated with OpenAI Gym. Five tasks are included: reach, push, slide, pick & place and stack. They all follow a Multi-Goal RL framework, allowing to use goal-oriented RL algorithms. To foster open-research, we chose to use the open-source physics engine PyBullet. The implementation chosen for this package allows to define very easily new tasks or new robots. This paper also presents a baseline of results obtained with state-of-the-art model-free off-policy algorithms. panda-gym is open-source and freely available at [anonymized link]. |
Quentin Gallouédec · Emmanuel Dellandrea · Liming Chen 🔗 |
-
|
Versatile Inverse Reinforcement Learning via Cumulative Rewards
(
Poster
)
SlidesLive Video » Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior. |
Niklas Freymuth · Philipp Becker · Gerhard Neumann 🔗 |
-
|
Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration
(
Poster
)
SlidesLive Video » Curiosity-based reward schemes can present powerful exploration mechanisms which facilitate the discovery of solutions for complex, sparse or long-horizon tasks. However, as the agent learns to reach previously unexplored spaces and the objective adapts to reward new areas, many behaviours emerge only to disappear due to being overwritten by the constantly shifting objective. We argue that merely using curiosity for fast environment exploration or as a bonus reward for a specific task does not harness the full potential of this technique and misses useful skills. Instead, we propose to shift the focus towards retaining the behaviours which emerge during curiosity-based learning. We posit that these self-discovered behaviours serve as valuable skills in an agent's repertoire to solve related tasks. Our experiments demonstrate the continuous shift in behaviour throughout training and the benefits of a simple policy snapshot method to reuse discovered behaviour for transfer tasks. |
Oliver Groth · Markus Wulfmeier · Giulia Vezzani · Vibhavari Dasagi · Tim Hertweck · Roland Hafner · Nicolas Heess · Martin Riedmiller 🔗 |
-
|
Using Dense Object Descriptors for Picking Cluttered General Objects with Reinforcement Learning
(
Poster
)
SlidesLive Video » We propose a reinforcement learning method for picking cluttered general objects using visual descriptors with suction grasp. In this paper, we learn cluttered object descriptors (CODs), which could represent rich object structures, and use the pre-trained CODs network along with its intermediate outputs to train a picking policy. We conduct experiments to evaluate our method. Our CODs could consistently represent known and unknown cluttered general objects, which allowed for the picking policy to robustly pick cluttered general objects. The resulting policy could pick 96.69% of unseen objects that are 2X as cluttered as the training scenarios. |
Hoang-Giang Cao · Weihao Zeng · I-Chen Wu 🔗 |
-
|
ADHERENT: Learning Human-like Trajectory Generators for Whole-body Control of Humanoid Robots
(
Poster
)
SlidesLive Video » Human-like trajectory generation and footstep planning has been a longstanding open problem in humanoid robotics. Meanwhile, research in computer graphics kept developing machine-learning methods for character animation based on training human-like models directly on motion capture data. Such methods proved effective in virtual environments, mainly focusing on trajectory visualization. This paper presents ADHERENT, a system architecture integrating machine-learning methods used in computer graphics with whole-body control methods employed in robotics to generate and stabilize human-like trajectories for humanoid robots. Leveraging human motion capture locomotion data, ADHERENT yields a general footstep planner, including forward, sideways, and backward walking trajectories that blend smoothly from one to another. At the joint configuration level, ADHERENT computes data-driven whole-body postural references coherent with the generated footsteps, thus increasing the human likeness of the resulting robot motion. Extensive validations of the proposed architecture are presented with both simulations and real experiments on the iCub humanoid robot. Supplementary video: https://sites.google.com/view/adherent-trajectory-learning. |
Paolo Maria Viceconte · Raffaello Camoriano · Giulio Romualdi · Diego Ferigo · Stefano Dafarra · Silvio Traversaro · Giuseppe Oriolo · Lorenzo Rosasco · Daniele Pucci 🔗 |
-
|
Open-Access Physical Robotics Environment for Real-World Reinforcement Learning Benchmark and Research
(
Poster
)
SlidesLive Video » Success stories of applied machine learning can be traced back to the datasets and environments that were put forward as challenges for the community. The challenge that the community sets as a benchmark is usually the challenge that the community eventually solves. The ultimate challenge of reinforcement learning research is to train real agents to operate in the real environment, but there is no common real-world benchmark to track the progress of RL on physical robotic systems. To address this issue we have created a physical RL benchmark -- a collection of real-world environments for reinforcement learning in robotics with free public remote access. In this work, we introduce four tasks in two environments and experimental results on one of them that demonstrate the feasibility of learning on a real robotic system. We train a mobile robot end-to-end to solve simple navigation task relying solely on camera input and without the access to location information. Close integration into existing ecosystem allows the community to start using the physical RL benchmark without any prior experience in robotics and takes away the burden of managing a physical robotics system, abstracting it under a familiar API. To start training, please visit https://anonymized |
Ashish Kumar · John Lanier · Qiaozhi Wang · Alicia Kavelaars · Ilya Kuzovkin 🔗 |
-
|
What Would the Expert do()?: Causal Imitation Learning
(
Poster
)
SlidesLive Video » We develop algorithms for imitation learning from data that was corrupted by unobserved confounders. Sources of such confounding include (a) persistent perturbations to actions or (b) the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch onto, leading to poor policy performance. By utilizing the effect of past states on current states, we are able to break up these spurious correlations, an application of the econometric technique of instrumental variable regression. This insight leads to two novel algorithms, one of a generative-modeling flavor (\texttt{DoubIL}) that can utilize access to a simulator and one of a game-theoretic flavor (\texttt{ResiduIL}) that can be run offline. Both approaches are able to find policies that match the result of a query to an unconfounded expert. We find both algorithms compare favorably to non-causal approaches on simulated control problems. |
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu 🔗 |
-
|
Guiding Evolutionary Strategies by Differentiable Robot Simulators
(
Poster
)
SlidesLive Video » In recent years, Evolutionary Strategies were actively explored in robotic tasks for policy search as they provide a simpler alternative to reinforcement learning algorithms. However, this class of algorithms is known to be extremely sample-inefficient. On the other hand, there is a growing interest in Differentiable Robot Simulators (DRS) as they potentially can find successful policies with only a handful of trajectories. But the resulting gradient is not always useful for the first-order optimization. In this work, we demonstrate how DRS gradient can be used in conjunction with Evolutionary Strategies. Preliminary results suggest that this combination can reduce sample complexity of Evolutionary Strategies by 3x-5x times in both simulation and the real world. |
Vladislav Kurenkov 🔗 |
-
|
Object Representations Guided By Optical Flow
(
Poster
)
SlidesLive Video » Objects are powerful abstractions for representing the complexity of the world, and many computer vision tasks focus on learning to understand objects and their properties in images from annotated examples. Spurred by advances in unsupervised visual representation learning, there is growing interest in learning object-centric image representations \emph{without} manual object annotations, through reconstruction and contrastive losses. We observe that these existing approaches fail to effectively exploit a long-known key signal for grouping object pixels, namely, motion in time. To address this, we propose to guide object representations during training to be consistent with optical flow correspondences between consecutive images in video sequences of moving objects. At test time, our approach generates object representations of individual images without requiring any correspondences. Through experiments across three datasets including a real-world robotic manipulation dataset, we demonstrate that our method consistently outperforms prior approaches including those that have access to additional information. |
Jianing Qian · Dinesh Jayaraman 🔗 |
-
|
IL-flOw: Imitation Learning from Observation using Normalizing Flows
(
Poster
)
SlidesLive Video » We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only that decouples reward modelling from policy learning, unlike state-of-the-art adversarial methods which require updating the reward model during policy search and are known to be unstable and difficult to optimize. Our method, IL-flOw, recovers the expert policy by modelling state-state transitions, by generating rewards using deep density estimators trained on the demonstration trajectories, avoiding the instability issues of adversarial methods. We demonstrate that using the state transition log-probability density as a reward signal for forward reinforcement learning translates to matching the trajectory distribution of the expert demonstrations, and experimentally show good recovery of the true reward signal as well as state of the art results for imitation from observation on locomotion and robotic continuous control tasks. |
Wei-Di Chang · Juan Camilo Gamboa Higuera · Scott Fujimoto · David Meger · Gregory Dudek 🔗 |
-
|
Maximum Likelihood Constraint Inference on Continuous State Spaces
(
Poster
)
SlidesLive Video » When a robot observes another agent unexpectedly modifying their behavior, inferring the most likely cause is a valuable tool for maintaining safety and reacting appropriately. In this work, we present a novel method for inferring constraints that works on continuous, possibly sub-optimal demonstrations. We first learn a representation of the continuous-state maximum entropy trajectory distribution using deep reinforcement learning. We then use Monte Carlo sampling from this distribution to generate expected constraint violation probabilities and perform constraint inference. When the agent's dynamics and objective function are known in advance, this process can be performed offline, allowing for real-time constraint inference at the moment demonstrations are observed. We demonstrate our approach on two continuous systems, including a human driving a model car. |
Kaylene Stocking · David McPherson · Robert Matthew · Claire Tomlin 🔗 |
-
|
Task-Independent Causal State Abstraction
(
Poster
)
SlidesLive Video » Learning dynamics models accurately and learning policies sample-efficiently are two important challenges for Model-Based Reinforcement Learning (MBRL). Regarding dynamics accuracy, in contrast to the sparse dynamics exhibited in many real world environments, most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states. Meanwhile, existing state abstractions can improve sample efficiency, but their dependence on specific reward functions constrains their applications to limited tasks. In this paper, we introduce an alternative state abstraction called Task-Independent Causal State Abstraction (TICSA). Exploiting sparsity exhibited in the real world, the proposed method first learns a causal dynamics model that generalizes to unexplored states. A state abstraction can then be derived from the learned dynamics, which not only improves sample efficiency but also applies to many tasks. Using a simulated manipulation environment and two different tasks, we observe that both the dynamics model and policies learned by TICSA generalize well to unseen states and that learning with TICSA also improves sample efficiency. |
Zizhao Wang · Xuesu Xiao · Yuke Zhu · Peter Stone 🔗 |
-
|
Variational Inference MPC for Robot Motion with Normalizing Flows
(
Poster
)
SlidesLive Video » In this paper, we propose an MPC method for robot motion by formulating MPC as Bayesian Inference. We propose using amortized variational inference to approximate the posterior with a normalizing flow conditioned on the start, goal and environment. By using a normalizing flow to represent the posterior, we are able to model complex distributions. This is important for robotics, where real environments impose difficult constraints on trajectories. We also present an approach for generalizing the learned sampling distribution to novel environments outside the training distribution. We demonstrate that our approach generalizes to a difficult novel environment and outperform a baseline sampling-based MPC method on a navigation problem. |
Thomas Power · Dmitry Berenson 🔗 |
-
|
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
(
Poster
)
SlidesLive Video » Robot learning holds the promise of learning policies that generalize broadly. However, such generalization requires sufficiently diverse datasets of the task of interest, which can be prohibitively expensive to collect. In this paper, we ask: what would it take to enable practical data reuse in robotics for end-to-end skill learning? We hypothesize that the key is to use datasets with multiple tasks and multiple domains, such that a new user that wants to train their robot to perform a new task in a new domain can include this dataset in their training process and benefit from cross-task and cross-domain generalization. To evaluate this hypothesis, we collect a large multi-domain and multi-task dataset, with 7,200 demonstrations constituting 71 tasks across 10 environments, and empirically study how this data can improve the learning of new tasks in new environments. We find that jointly training with the proposed dataset and 50 demonstrations of a never-before-seen task in a new domain on average leads to a 2x improvement in success rate compared to using target domain data alone. We also find that data for only a few tasks in a new domain can bridge the domain gap and make it possible for a robot to perform a variety of prior tasks that were only seen in other domains. |
Frederik Ebert · Yanlai Yang · Karl Schmeckpeper · Bernadette Bucher · Kostas Daniilidis · Chelsea Finn · Sergey Levine 🔗 |
-
|
Hybrid Imitative Planning with Geometric and Predictive Costs in Offroad Environments
(
Poster
)
SlidesLive Video » Mobile robots tasked with reaching user-specified goals in open-world outdoor environments must contend with numerous challenges, including complex perception and unexpected obstacles and terrains. Prior work has addressed such problems with geometric methods that reconstruct obstacles, as well as learning-based methods. While geometric methods provide good generalization, they can be brittle in outdoor environments that violate their assumptions (e.g., tall grass). On the other hand, learning-based methods can learn to directly select collision-free paths from raw observations, but are difficult to integrate with standard geometry-based pipelines. This creates an unfortunate ``either-or" dichotomy -- either use learning and lose out on well-understood geometric navigational components, or do not use it, in favor of extensively hand-tuned geometry-based cost maps. The main idea of our approach is reject this dichotomy by designing the learning and non-learning-based components in a way such that they can be easily and effectively combined and created without labeling any data. Both components contribute to a planning criterion: the learned component contributes predicted traversability as rewards, while the geometric component contributes obstacle cost information. We instantiate and comparatively evaluate our system in a high-fidelity simulator. We show that this approach inherits complementary gains from both components: the learning-based component enables the system to quickly adapt its behavior, and the geometric component often prevents the system from making catastrophic errors. |
Dhruv Shah · Daniel Shin · Nick Rhinehart · Ali Agha · David D Fan · Sergey Levine 🔗 |
-
|
Demonstration-Guided Q-Learning
(
Poster
)
SlidesLive Video » In many challenging reinforcement learning (RL) settings, demonstrations are used to assist with exploration by allowing policies or value functions to directly learn from successful experience. In this work, we explore additional ways to utilize expert demonstrations to expedite training in value-based RL. In particular, we propose Demonstration-Guided Q-Learning (DGQL), an algorithm that noisily replays expert demonstrations to guide exploration and enable more efficient Q-value propagation in value-based RL methods. Contrary to common methods that utilize demonstrations in the context of value-based RL, we show that DGQL effectively leverages demonstrations to guide exploration via a replaying curriculum that relaxes common assumptions in simulated environments. In addition to analyzing the empirical benefits of more efficient value propagation, we show that DGQL is able to scale to difficult vision-based robotic manipulation tasks. |
Ikechukwu Uchendu · Ted Xiao · Yao Lu · Mengyuan Yan · Karol Hausman 🔗 |
-
|
Simultaneous Human Action and Motion Prediction
(
Poster
)
SlidesLive Video » This paper presents a novel approach to solve simultaneously the problems of human whole-body motion prediction and action recognition for real-time applications. Starting from the dynamics of human motion and motor system theory, the notion of mixture of experts from deep learning literature has been extended to solve this problem. The work is accompanied by 66-DoFs human model experiments. |
Kourosh Darvish · Daniele Pucci 🔗 |
-
|
Lifelong Robotic Reinforcement Learning by Retaining Experiences
(
Poster
)
SlidesLive Video » Multi-task learning ideally allows robots to acquire a diverse repertoire of useful skills. However, many multi-task reinforcement learning efforts assume the robot can collect data from \emph{all} tasks at \emph{all} times. In reality, the tasks that the robot learns arrive sequentially, depending on the user and the robot's current environment. In this work, we study a practical sequential multi-task RL problem that is motivated by the practical constraints of physical robotic systems, and derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set. In a series of simulated robotic manipulation experiments, our approach requires less than half the samples than learning each task from scratch, while avoiding impractical round-robin data collection. On a Franka Emika Panda robot arm, our approach incrementally learns ten challenging tasks, including bottle capping and block insertion. |
Annie Xie · Chelsea Finn 🔗 |
-
|
Continual Learning of Semantic Segmentation using Complementary 2D-3D Data Representations
(
Poster
)
SlidesLive Video » Semantic segmentation networks are usually pre-trained and not updated during deployment. As a consequence, misclassifications commonly occur if the distribution of the training data deviates from the one encountered during the robot's operation. We propose to mitigate this problem by adapting the neural network to the robot's environment during deployment, without any need for external supervision. Leveraging complementary data representations, we generate a supervision signal, by probabilistically accumulating consecutive 2D semantic predictions in a volumetric 3D map. We then retrain the network on renderings of the accumulated semantic map, effectively resolving ambiguities and enforcing multi-view consistency through the 3D representation. To preserve the previously-learned knowledge while performing network adaptation, we employ a continual learning strategy based on experience replay. Through extensive experimental evaluation, we show successful adaptation to real-world indoor scenes both on the ScanNet dataset and on in-house data recorded with an RGB-D sensor. Our method increases the segmentation performance on average by 11.8% compared to the fixed pre-trained neural network, while effectively retaining knowledge from the pre-training dataset. |
Jonas Frey · Hermann Blum · Francesco Milano · Roland Siegwart · Cesar Cadena 🔗 |