Timezone: »

Workshop
Deployable Decision Making in Embodied Systems (DDM)
Angela Schoellig · Animesh Garg · Somil Bansal · SiQi Zhou · Melissa Greeff · Lukas Brunke

Tue Dec 14 07:00 AM -- 03:00 PM (PST) @

Embodied systems are playing an increasingly important role in our lives. Examples include, but are not limited to, autonomous driving, drone delivery, and service robots. In real-world deployments, the systems are required to safely learn and operate under the various sources of uncertainties. As noted in the “Roadmap for US Robotics (2020)”, safe learning and adaptation is a key aspect of next-generation robotics. Learning is ingrained in all components of the robotics software stack including perception, planning, and control. While the safety and robustness of these components have been identified as critical aspects for real-world deployments, open issues and challenges are often discussed separately in the respective communities. In this workshop, we aim to bring together researchers from machine learning, computer vision, robotics, and control to facilitate interdisciplinary discussions on the topic of deployable decision making in embodied systems. Our workshop will focus on two discussion themes: (i) safe learning and decision making in uncertain and unstructured environments and (ii) efficient transfer learning for deployable embodied systems. To facilitate discussions and solicit participation from a broad audience, we plan to have a set of interactive lecture-style presentations, focused discussion panels, and a poster session with contributed paper presentations. By bringing researchers and industry professionals together in our workshop and having detailed pre- and post-workshop plans, we envision this workshop to be an effort towards a long-term, interdisciplinary exchange on this topic.

 Tue 7:00 a.m. - 7:10 a.m. Opening Remarks & Introduction (Live Introduction) Angela Schoellig · Somil Bansal 🔗 Tue 7:10 a.m. - 7:30 a.m. Reinforcement Learning in Real-World Control Systems (Invited Talk) Martin Riedmiller 🔗 Tue 7:30 a.m. - 7:50 a.m. Deployable Robot Learning with Self-supervised Spatial Action Maps (Invited Talk) Shuran Song 🔗 Tue 7:50 a.m. - 8:10 a.m. Learning Abstractions for Robust and Tractable Planning (Invited Talk) Nick Roy 🔗 Tue 8:10 a.m. - 8:30 a.m. Learning Closed-form Control Law for Adaptive and Fast Decision (Invited Talk) Aude G Billard 🔗 Tue 8:30 a.m. - 8:35 a.m. Coffee Break (Break) 🔗 Tue 8:35 a.m. - 9:35 a.m. Panel A: Deployable Learning Algorithms for Embodied Systems (Panel Discussion) Shuran Song · Martin Riedmiller · Nick Roy · Aude G Billard · Angela Schoellig · SiQi Zhou 🔗 Tue 9:35 a.m. - 9:40 a.m. Spotlight Talk Introduction (Spotlight) Somil Bansal · Lukas Brunke 🔗 Tue 9:40 a.m. - 10:08 a.m. Spotlights (Spotlight) Hager Radi · Krishan Rana · Yunzhu Li · Shuang Li · Gal Leibovich · Guy Jacob · Ruihan Yang 🔗 Tue 10:08 a.m. - 10:15 a.m. Evaluation as a Process for Engineering Responsibility in AI (Invited Spotlight) Deborah Raji 🔗 Tue 10:15 a.m. - 11:00 a.m. Poster Session  link » 🔗 Tue 11:00 a.m. - 11:10 a.m. Theme B Introduction (Introduction) Animesh Garg 🔗 Tue 11:10 a.m. - 11:30 a.m. Robust Embedded Learned Models (Invited Talk) Dragos Margineantu 🔗 Tue 11:30 a.m. - 11:50 a.m. Human-in-the-loop Bayesian Deep Learning (Invited Talk) Yarin Gal 🔗 Tue 11:50 a.m. - 12:10 p.m. Learning for Agile Control in the Real World: Challenges and Opportunities (Invited Talk) Yisong Yue · Ivan D Jimenez Rodriguez 🔗 Tue 12:10 p.m. - 12:30 p.m. Enforcing Robustness for Neural Network Policies (Invited Talk) J. Zico Kolter 🔗 Tue 12:30 p.m. - 12:45 p.m. Coffee Break (Break) 🔗 Tue 12:45 p.m. - 1:45 p.m. Panel B: Safe Learning and Decision Making in Uncertain and Unstructured Environments (Panel Discussion) Yisong Yue · J. Zico Kolter · Ivan Dario D Jimenez Rodriguez · Dragos Margineantu · Animesh Garg · Melissa Greeff 🔗 Tue 1:45 p.m. - 2:00 p.m. Concluding Remarks (Conclusion) Angela Schoellig · Lukas Brunke 🔗 Tue 2:00 p.m. - 3:00 p.m. Post-Workshop Social Event (Social Event)  link » 🔗 - Tutorial: Safe Learning for Decision Making (Tutorial)  link » Angela Schoellig · SiQi Zhou · Lukas Brunke · Animesh Garg · Melissa Greeff · Somil Bansal 🔗 - Zero-Shot Uncertainty-Aware Deployment of Simulation Trained Policies on Real-World Robots (Poster) While deep reinforcement learning (RL) agents have demonstrated incredible potential in attaining dexterous behaviours for robotics, they tend to make errors when deployed in the real world due to mismatches between the training and execution environments. In contrast, the classical robotics community have developed a range of controllers that can safely operate across most states in the real world given their explicit derivation. These controllers however lack the dexterity required for complex tasks given limitations in analytical modelling and approximations. In this paper, we propose Bayesian Controller Fusion (BCF), a novel uncertainty-aware deployment strategy that combines the strengths of deep RL policies and traditional handcrafted controllers. In this framework, we can perform zero-shot sim-to-real transfer, where our uncertainty based formulation allows the robot to reliably act within out-of-distribution states by leveraging the handcrafted controller while gaining the dexterity of the learned system otherwise. We show promising results on two real-world continuous control tasks, where BCF outperforms both the standalone policy and controller, surpassing what either can achieve independently. A supplementary video demonstrating our system is provided at https://bit.ly/bcf_deploy. Krishan Rana · Vibhavari Dasagi · Michael Milford · Niko Suenderhauf 🔗 - Zero-Shot Uncertainty-Aware Deployment of Simulation Trained Policies on Real-World Robots (Spotlight) While deep reinforcement learning (RL) agents have demonstrated incredible potential in attaining dexterous behaviours for robotics, they tend to make errors when deployed in the real world due to mismatches between the training and execution environments. In contrast, the classical robotics community have developed a range of controllers that can safely operate across most states in the real world given their explicit derivation. These controllers however lack the dexterity required for complex tasks given limitations in analytical modelling and approximations. In this paper, we propose Bayesian Controller Fusion (BCF), a novel uncertainty-aware deployment strategy that combines the strengths of deep RL policies and traditional handcrafted controllers. In this framework, we can perform zero-shot sim-to-real transfer, where our uncertainty based formulation allows the robot to reliably act within out-of-distribution states by leveraging the handcrafted controller while gaining the dexterity of the learned system otherwise. We show promising results on two real-world continuous control tasks, where BCF outperforms both the standalone policy and controller, surpassing what either can achieve independently. A supplementary video demonstrating our system is provided at https://bit.ly/bcf_deploy. Krishan Rana · Vibhavari Dasagi · Michael Milford · Niko Suenderhauf 🔗 - Validate on Sim, Detect on Real - Model Selection for Domain Randomization (Poster) A practical approach to learning robot skills, often termed sim2real, is to train control policies in simulation and then deploy them on a real robot. Popular techniques to improve the sim2real transfer build on domain randomization (DR) - training the policy on a diverse set of randomly randomly generated domains with the hope of better generalization to the real world. Due to the large number of hyper-parameters in both the policy learning and DR algorithms, one often ends up with a large number of trained models, where choosing the best model among them demands costly evaluation on the real robot. In this work we ask - Can we rank the policies without running them in the real world? Our main idea is that a predefined set of real world data can be used to evaluate all policies, using out-of-distribution detection (OOD) techniques. In a sense, this approach can be seen as a "unit test" to evaluate policies before any real world execution. However, we find that by itself, the OOD score can be inaccurate and very sensitive to the particular OOD method. Our main contribution is a simple-yet-effective policy score that combines OOD with an evaluation in simulation. We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data. We evaluate the effectiveness of VSDR on sim2real transfer in a robotic grasping task with image inputs. We extensively evaluate different DR parameters and OOD methods, and show that VSDR improves policy selection across the board. More importantly, our method achieves significantly better ranking, and uses significantly less data compared to baselines. Guy Jacob · Gal Leibovich · Shadi Endrawis · Gal Novik · Aviv Tamar 🔗 - Validate on Sim, Detect on Real - Model Selection for Domain Randomization (Spotlight) A practical approach to learning robot skills, often termed sim2real, is to train control policies in simulation and then deploy them on a real robot. Popular techniques to improve the sim2real transfer build on domain randomization (DR) - training the policy on a diverse set of randomly randomly generated domains with the hope of better generalization to the real world. Due to the large number of hyper-parameters in both the policy learning and DR algorithms, one often ends up with a large number of trained models, where choosing the best model among them demands costly evaluation on the real robot. In this work we ask - Can we rank the policies without running them in the real world? Our main idea is that a predefined set of real world data can be used to evaluate all policies, using out-of-distribution detection (OOD) techniques. In a sense, this approach can be seen as a "unit test" to evaluate policies before any real world execution. However, we find that by itself, the OOD score can be inaccurate and very sensitive to the particular OOD method. Our main contribution is a simple-yet-effective policy score that combines OOD with an evaluation in simulation. We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data. We evaluate the effectiveness of VSDR on sim2real transfer in a robotic grasping task with image inputs. We extensively evaluate different DR parameters and OOD methods, and show that VSDR improves policy selection across the board. More importantly, our method achieves significantly better ranking, and uses significantly less data compared to baselines. Guy Jacob · Gal Leibovich · Shadi Endrawis · Gal Novik · Aviv Tamar 🔗 - Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers (Poster) We propose to address quadrupedal locomotion tasks using Reinforcement Learning (RL) with a Transformer-based model that learns to combine proprioceptive information and high-dimensional depth sensor inputs. While learning-based locomotion has made great advances using RL, most methods still rely on domain randomization for training blind agents that generalize to challenging terrains. Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL method that leverages both proprioceptive states and visual observations for locomotion control. We evaluate our method in challenging simulated environments with different obstacles and uneven terrain. We transfer our learned policy from simulation to a real robot by running it indoor and in-the-wild with unseen obstacles and terrain. Our method not only significantly improves over baselines, but also achieves far better generalization performance, especially when transferred to the real robot. Our project page with videos is at https://LocoTransformer.github.io/. Ruihan Yang · Minghao Zhang · Nicklas Hansen · Huazhe Xu · Xiaolong Wang 🔗 - Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers (Spotlight) We propose to address quadrupedal locomotion tasks using Reinforcement Learning (RL) with a Transformer-based model that learns to combine proprioceptive information and high-dimensional depth sensor inputs. While learning-based locomotion has made great advances using RL, most methods still rely on domain randomization for training blind agents that generalize to challenging terrains. Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL method that leverages both proprioceptive states and visual observations for locomotion control. We evaluate our method in challenging simulated environments with different obstacles and uneven terrain. We transfer our learned policy from simulation to a real robot by running it indoor and in-the-wild with unseen obstacles and terrain. Our method not only significantly improves over baselines, but also achieves far better generalization performance, especially when transferred to the real robot. Our project page with videos is at https://LocoTransformer.github.io/. Ruihan Yang · Minghao Zhang · Nicklas Hansen · Huazhe Xu · Xiaolong Wang 🔗 - 3D Neural Scene Representations for Visuomotor Control (Poster) Humans have a strong intuitive understanding of the 3D environment around us. The mental model of the physics in our brain applies to objects of different materials and enables us to perform a wide range of manipulation tasks that are far beyond the reach of current robots. In this work, we desire to learn models for dynamic 3D scenes purely from 2D visual observations. Our model combines Neural Radiance Fields (NeRF) and time contrastive learning with an autoencoding framework, which learns viewpoint-invariant 3D-aware scene representations. We show that a dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks involving both rigid bodies and fluids, where the target is specified in a viewpoint different from what the robot operates on. When coupled with an auto-decoding framework, it can even support goal specification from camera viewpoints that are outside the training distribution. We further demonstrate the richness of the learned 3D dynamics model by performing future prediction and novel view synthesis. Finally, we provide detailed ablation studies regarding different system designs and qualitative analysis of the learned representations. Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba 🔗 - 3D Neural Scene Representations for Visuomotor Control (Spotlight) Humans have a strong intuitive understanding of the 3D environment around us. The mental model of the physics in our brain applies to objects of different materials and enables us to perform a wide range of manipulation tasks that are far beyond the reach of current robots. In this work, we desire to learn models for dynamic 3D scenes purely from 2D visual observations. Our model combines Neural Radiance Fields (NeRF) and time contrastive learning with an autoencoding framework, which learns viewpoint-invariant 3D-aware scene representations. We show that a dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks involving both rigid bodies and fluids, where the target is specified in a viewpoint different from what the robot operates on. When coupled with an auto-decoding framework, it can even support goal specification from camera viewpoints that are outside the training distribution. We further demonstrate the richness of the learned 3D dynamics model by performing future prediction and novel view synthesis. Finally, we provide detailed ablation studies regarding different system designs and qualitative analysis of the learned representations. Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba 🔗 - Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? (Poster) The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned agent will perform if deployed in the target environment before actually deploying it and without the risk of overestimating its true performance. To achieve this, we introduce a framework for safe evaluation of offline learning using approximate high-confidence off-policy evaluation (HCOPE) to estimate the performance of offline policies during learning. In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping. A lower-bound estimate tells us how good a newly-learned target policy would perform before it is deployed in the real environment, and therefore allows us to decide when to deploy our learned policy. Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor  🔗 - Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? (Spotlight) The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned agent will perform if deployed in the target environment before actually deploying it and without the risk of overestimating its true performance. To achieve this, we introduce a framework for safe evaluation of offline learning using approximate high-confidence off-policy evaluation (HCOPE) to estimate the performance of offline policies during learning. In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping. A lower-bound estimate tells us how good a newly-learned target policy would perform before it is deployed in the real environment, and therefore allows us to decide when to deploy our learned policy. Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor  🔗 - ilpyt: Imitation Learning Research Code Base in PyTorch (Poster) Imitation learning, or learning by example, is an intuitive way to teach new behaviors to autonomous systems. With the parallel growth of deep reinforcement learning research, a rich taxonomy of imitation learning algorithms has emerged. These imitation learning algorithms show promise for teaching safe robot behaviors in increasingly dynamic environments by (1) implicitly bounding behaviors to lay in the field of human demonstration and (2) tackling the computational scalability issues of modern reinforcement learning methods. In this paper, we present ilpyt, a research code base which implements a variety of imitation learning and reinforcement learning algorithm families in a shared infrastructure. It contains implementations of popular deep imitation learning algorithms, written in a modular fashion for easy user customization, novel implementation, and fast benchmarking. The provided algorithm implementations were done in Python using PyTorch, and the overall library organization is inspired by the popular reinforcement learning research library, rlpyt. This white paper summarizes the key features and basic usage of the ilpyt library, as well as benchmark results for the implemented algorithms in several representative OpenAI Gym environments. We hope ilpyt can serve as a launching point for accelerated development in the imitation learning field. ilpyt is available for download at https://github.com/mitre/ilpyt. Amanda Vu 🔗 - Equidistant Hyperspherical Prototypes Improve Uncertainty Quantification (Poster) Uncertainty quantification is essential for a robot operating in an open world, not only for known concepts, but especially for unknown concepts that it may encounter and classify. In recent years, prototype-based approaches have shown to be an effective direction for classification in deep networks. In such approaches, each concept is represented by a single vector -- a prototype -- on the output manifold of the network. The starting point of this work is that common choices of prototype positions, whether it be one-hot vectors, vectors from prior knowledge, vectors from separation, or random vectors, are indeed effective for classification, but fail to quantify when a sample displays an unknown concept. The hypothesis of this work is that in order to best quantify uncertainty over known and unknown concepts, prototypes should be uniform and equidistant. We introduce Equidistant Hyperspherical Prototype Networks, where arbitrary numbers of concepts are modelled as equidistant prototypes on a hyperspherical output manifold. This results in a distribution of the output space with as much room as possible for unknown concepts to occupy the empty space between the prototypes. We provide initial empirical results on MIT Indoor Places which show that equidistant prototypes can both model known concepts and quantify when samples display unknown concepts. The equidistant prototypes are defined by recursion, easy to implement, and with trivial overhead in computation, making them suitable for open world settings. Gertjan Burghouts · Pascal Mettes 🔗 - What Would the Expert $do(\cdot)$?: Causal Imitation Learning (Poster) We develop algorithms for imitation learning from data that was corrupted by unobserved confounders. Sources of such confounding include (a) persistent perturbations to actions or (b) the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch onto, leading to poor policy performance. By utilizing the effect of past states on current states, we are able to break up these spurious correlations, an application of the econometric technique of instrumental variable regression. This insight leads to two novel algorithms, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator and one of a game-theoretic flavor (ResiduIL) that can be run offline. Both approaches are able to find policies that match the result of a query to an unconfounded expert. We find both algorithms compare favorably to non-causal approaches on simulated control problems. Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu 🔗 - Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (Poster) Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions that are predicted to satisfy all constraints. We propose CAP, a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment. We further show that this guarantees the safety of all intermediate solutions during RL training. Further, CAP adaptively tunes this penalty during training using true cost feedback from the environment. We evaluate this conservative and adaptive penalty-based approach for model-based safe RL extensively on state and image-based environments. Our results demonstrate substantial gains in sample-efficiency while incurring fewer violations than prior safe RL algorithms. Yecheng Ma · Andrew Shen · Osbert Bastani · Dinesh Jayaraman 🔗 - Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization (Poster) Developing robust vision-guided controllers for quadrupedal robots in complex environments, with various obstacles, dynamical surroundings and uneven terrains, is very challenging. While Reinforcement Learning (RL) provides a promising paradigm for agile locomotion skills with vision inputs in simulation, it is still very challenging to deploy the RL policy in the real world. Our key insight is that aside from the discrepancy in the domain gap, in visual appearance between the simulation and the real world, the latency from the control pipeline is also a major cause of difficulty. In this paper, we propose Multi-Modal Delay Randomization (MMDR) to address this issue when training RL agents. Specifically, we simulate the latency of real hardware by using past observations, sampled with randomized periods, for both proprioception and vision. We train the RL policy for end-to-end control in a physical simulator without any predefined controller or reference motion, and directly deploy it on the real A1 quadruped robot running in the wild. We evaluate our method in different outdoor environments with complex terrains and obstacles. We demonstrate the robot can smoothly maneuver at a high speed, avoid the obstacles, and show significant improvement over the baselines. Our project page with videos is at https://mmdr-wild.github.io/. Chieko Imai · Minghao Zhang · Ruihan Yang · Yuzhe Qin · Xiaolong Wang 🔗 - Towards Safe Global Optimality in Robot Learning with GoSafe (Poster) When learning control policies from trial and error directly on hardware systems, ensuring safety is crucial to avoid costly damage to the system. Existing model-free reinforcement learning methods that guarantee safety during exploration are limited to optima within the safe region connected to a safe initialization, which may be worse than the safe globally optimal solution. In this work, we present GoSafe, an algorithm that can search for globally optimal policies while guaranteeing safety and demonstrate its applicability in experiments on a real robot arm. Bhavya Sukhija · Matteo Turchetta · Andreas Krause · Sebastian Trimpe · Dominik Baumann 🔗 - Learning Impedance Actions for Safe Reinforcement Learning in Contact-Rich Tasks (Poster) Reinforcement Learning (RL) has the potential of solving complex continuous control tasks, with direct applications to robotics. Nevertheless, current state-of-the-art methods are generally unsafe to learn directly on a physical robot as exploration by trial-and-error can cause harm to the real world systems. In this paper, we leverage a framework for learning latent action spaces for RL agents from demonstrated trajectories. We extend this framework by connecting it to a variable impedance Cartesian space controller, allowing us to learn contact-rich tasks safely and efficiently. Our method learns from trajectories that incorporate both positional, but also crucially impedance-space information. We evaluate our method on a number of peg-in-hole task variants with a Franka Panda arm and demonstrate that learning variable impedance actions for RL in Cartesian space can be safely deployed on the real robot directly, without resorting to learning in simulation and a subsequent policy transfer. Quantao Yang · Elin Topp · Todor Stoyanov · Johannes A. Stork 🔗 - Extraneousness-Aware Imitation Learning (Poster) Visual imitation learning is an effective approach for intelligent agents to obtain control policies from visual demonstration sequences. However, standard visual imitation learning assumes expert demonstration that only contains the task-relevant frames. While previous works propose to learn from \textit{noisy} demonstration, it still remains challenging when there are locally consistent yet task irrelevant subsequences in the demonstration. We term this kind of imitation learning imitation-learning-with-extraneousness'' and introduce Extraneousness-Aware Imitation Learning (EIL), a self-supervised approach that learns visuomotor policies from third-person demonstrations where extraneous subsequences exist. EIL learns action-conditioned self-supervised frame embeddings and aligns task-relevant frames across videos while excluding the extraneous parts. Our method allows agents to learn from extraneousness-rich demonstrations by intelligently ignoring irrelevant components. Experimental results show that EIL significantly outperforms strong baselines and approaches the level of training from the perfect demonstration on various simulated continuous control tasks and alearning-from-slides'' task. Ray Zheng · Kaizhe Hu · Boyuan Chen · Huazhe Xu 🔗 - Reward-Based Environment States for Robot Manipulation Policy Learning (Poster) Training robot manipulation policies is a challenging and open problem in robotics and artificial intelligence. In this paper we propose a novel and compact state representation based on the rewards predicted from an image-based task success classifier. Our experiments---using the Pepper robot in simulation with two deep reinforcement learning algorithms on a grab-and-lift task---reveal that our proposed state representation can achieve up to 97\% task success using our best policies. Isabelle Ferrane · Heriberto Cuayahuitl 🔗 - Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation (Poster) Learning to solve precision-based manipulation tasks from visual feedback using Reinforcement Learning (RL) could drastically reduce the engineering efforts required by traditional robot systems. However, performing fine-grained motor control from visual inputs alone is challenging, especially with a static third-person camera as often used in previous work. We propose a setting for robotic manipulation in which the agent receives visual feedback from both a third-person camera and an egocentric camera mounted on the robot's wrist. While the third-person camera is static, the egocentric camera enables the robot to actively control its vision to aid in precise manipulation. To fuse visual information from both cameras effectively, we additionally propose to use Transformers with a cross-view attention mechanism that models spatial attention from one view to another (and vice-versa), and use the learned features as input to an RL policy. Our method improves learning over strong single-view and multi-view baselines, and successfully transfers to a set of challenging manipulation tasks on a real robot with uncalibrated cameras, no access to state information, and a high degree of task variability. In a hammer manipulation task, our method succeeds in 75% of trials versus 38% and 13% for multi-view and single-view baselines, respectively. Rishabh Jangir · Nicklas Hansen · Mohit Jain · Xiaolong Wang 🔗 - Deep Reinforcement Learning Policies for Underactuated Satellite Attitude Control (Poster) Autonomy is a key challenge for future space exploration endeavours. Deep Reinforcement Learning holds the promises for developing agents able to learn complex behaviours simply by interacting with their environment. This paper investigates the use of Reinforcement Learning for the satellite attitude control problem, namely the angular reorientation of a spacecraft with respect to an inertial frame of reference. In the proposed approach, a set of control policies are implemented as neural networks trained with a custom version of the Proximal Policy Optimization algorithm to maneuver a small satellite from a random starting angle to a given pointing target. In particular, we address the problem for two working conditions: the nominal case, in which all the actuators (a set of 3 reaction wheels) are working properly, and the underactuated case, where an actuator failure is simulated randomly along with one of the axes. We show that the agents learn to effectively perform large-angle slew maneuvers with fast convergence and industry-standard pointing accuracy. Furthermore, we test the proposed method on representative hardware, showing that by taking adequate measures controllers trained in simulation can perform well in real systems. Matteo El Hariry 🔗 - GRILC: Gradient-based Reprogrammable Iterative Learning Control for Autonomous Systems (Poster) We propose a novel gradient-based reprogrammable iterative learning control (GRILC) framework for autonomous systems. Performance of trajectory following in autonomous systems is often limited by mismatch between a complex actual model and a simplifed nominal model used in controller design. To overcome this issue, we develop the GRILC framework with offline optimization using the information of the nominal model and the actual trajectory, and online system implementation. In addition, a partial and reprogrammable learning strategy is introduced. The proposed method is applied to the autonomous time-trialing example and the learned control policies can be stored into a library for future motion planning. The simulation and experimental results illustrate the effectiveness and robustness of the proposed approach. Kuan-Yu Tseng · Jeff Shamma · Geir Dullerud 🔗 - A Unified Approach to Obstacle Avoidance and Motion Learning (Poster) A dynamical system based motion representation for obstacle avoidance and motion learning is proposed. The obstacle avoidance problem can be inverted to enforce that the flow remains enclosed within a given volume. A robot arm can be controlled by using the $\Gamma$-field in combination with the converging dynamical system. The closed-form model is extended to time-varying environments, i.e., moving, expanding and shrinking obstacles. This is applied to an autonomous robot (QOLO) in a dynamic crowd in the center of Lausanne. Using Gaussian Mixture Regression (GMR) motion can be learned by describing them as a combination of local rotations. The motion can be further refined to create a safe invariant set within the obstacles' hull. Lukas Huber · Aude G Billard · Jean-Jacques Slotine 🔗

#### Author Information

##### Animesh Garg (University of Toronto, Nvidia, Vector Institute)

I am a CIFAR AI Chair Assistant Professor of Computer Science at the University of Toronto, a Faculty Member at the Vector Institute, and Sr. Researcher at Nvidia. My current research focuses on machine learning for perception and control in robotics.

##### Somil Bansal (University of Southern California)

Somil Bansal is an Assistant Professor at the Department of Electrical Engineering of the University of Southern California, Los Angeles. He received a Ph.D. in Electrical Engineering and Computer Sciences (EECS) from the University of California at Berkeley in 2020. Before that, he obtained a B.Tech. in Electrical Engineering from the Indian Institute of Technology, Kanpur, and an M.S. in Electrical Engineering and Computer Sciences from UC Berkeley in 2012 and 2014, respectively. Between August 2020 and August 2021, he spent a year as a Research Scientist at Waymo (formerly known as the Google Self-Driving Car project). He has also collaborated closely with companies like Skydio, Google, Waymo, Boeing, as well as NASA Ames. Somil is broadly interested in developing mathematical tools and algorithms for the control and analysis of autonomous systems, with a focus on bridging learning and control-theoretic approaches for safety-critical autonomous systems. Somil has received several awards, most notably the Eli Jury Award at UC Berkeley for his doctoral research, the outstanding graduate student instructor award at UC Berkeley, and the academic excellence award at IIT Kanpur.