Timezone: »
We tackle the Multi-task Batch Reinforcement Learning problem. Given multiple datasets collected from different tasks, we train a multi-task policy to perform well in unseen tasks sampled from the same distribution. The task identities of the unseen tasks are not provided. To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards. Because the different datasets may have state-action distributions with large divergence, the task inference module can learn to ignore the rewards and spuriously correlate \textit{only} state-action pairs to the task identity, leading to poor test time performance. To robustify task inference, we propose a novel application of the triplet loss. To mine hard negative examples, we relabel the transitions from the training tasks by approximating their reward functions. When we allow further training on the unseen tasks, using the trained policy as an initialization leads to significantly faster convergence compared to randomly initialized policies (up to 80% improvement and across 5 different Mujoco task distributions). We name our method \textbf{MBML} (\textbf{M}ulti-task \textbf{B}atch RL with \textbf{M}etric \textbf{L}earning).
Author Information
Jiachen Li (University of California, San Diego)
Quan Vuong (University of California San Diego)
Shuang Liu (University of California, San Diego)
Minghua Liu (UCSD)
Kamil Ciosek (Microsoft)
Henrik Christensen (UC San Diego)
Hao Su (UCSD)
More from the Same Authors
-
2021 : ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations »
Tongzhou Mu · Zhan Ling · Fanbo Xiang · Derek Yang · Xuanlin Li · Stone Tao · Zhiao Huang · Zhiwei Jia · Hao Su -
2021 : From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation from Single-Camera Teleoperation »
Yuzhe Qin · Hao Su · Xiaolong Wang -
2022 : Offline Reinforcement Learning from Heteroskedastic Data Via Support Constraints »
Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine -
2022 : On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning »
yifan xu · Nicklas Hansen · Zirui Wang · Yung-Chieh Chan · Hao Su · Zhuowen Tu -
2022 : Generalizable Point Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation »
Yuzhe Qin · Binghao Huang · Zhao-Heng Yin · Hao Su · Xiaolong Wang -
2022 : Offline Reinforcement Learning from Heteroskedastic Data Via Support Constraints »
Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine -
2022 : Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization »
Stone Tao · Xiaochen Li · Tongzhou Mu · Zhiao Huang · Yuzhe Qin · Hao Su -
2022 : VARIATIONAL REPARAMETRIZED POLICY LEARNING WITH DIFFERENTIABLE PHYSICS »
Zhiao Huang · Litian Liang · Zhan Ling · Xuanlin Li · Chuang Gan · Hao Su -
2022 : Multi-skill Mobile Manipulation for Object Rearrangement »
Jiayuan Gu · Devendra Singh Chaplot · Hao Su · Jitendra Malik -
2022 : MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations »
Nicklas Hansen · Yixin Lin · Hao Su · Xiaolong Wang · Vikash Kumar · Aravind Rajeswaran -
2022 : On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning »
yifan xu · Nicklas Hansen · Zirui Wang · Yung-Chieh Chan · Hao Su · Zhuowen Tu -
2022 Poster: DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning »
Quan Vuong · Aviral Kumar · Sergey Levine · Yevgen Chebotar -
2021 Poster: Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation »
Nicklas Hansen · Hao Su · Xiaolong Wang -
2021 Poster: Particle Cloud Generation with Message Passing Generative Adversarial Networks »
Raghav Kansal · Javier Duarte · Hao Su · Breno Orzari · Thiago Tomei · Maurizio Pierini · Mary Touranakou · jean-roch vlimant · Dimitrios Gunopulos -
2020 Poster: Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs »
Hao Tang · Zhiao Huang · Jiayuan Gu · Bao-Liang Lu · Hao Su -
2020 Poster: Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals »
Tongzhou Mu · Jiayuan Gu · Zhiwei Jia · Hao Tang · Hao Su -
2019 Poster: Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck »
Maximilian Igl · Kamil Ciosek · Yingzhen Li · Sebastian Tschiatschek · Cheng Zhang · Sam Devlin · Katja Hofmann -
2019 Poster: Better Exploration with Optimistic Actor Critic »
Kamil Ciosek · Quan Vuong · Robert Loftin · Katja Hofmann -
2019 Spotlight: Better Exploration with Optimistic Actor Critic »
Kamil Ciosek · Quan Vuong · Robert Loftin · Katja Hofmann -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang -
2018 Poster: Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions »
Minhyuk Sung · Hao Su · Ronald Yu · Leonidas Guibas -
2017 Poster: Approximation and Convergence Properties of Generative Adversarial Learning »
Shuang Liu · Olivier Bousquet · Kamalika Chaudhuri -
2017 Spotlight: Approximation and Convergence Properties of Generative Adversarial Learning »
Shuang Liu · Olivier Bousquet · Kamalika Chaudhuri -
2017 Poster: PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space »
Charles Ruizhongtai Qi · Li Yi · Hao Su · Leonidas Guibas -
2016 Poster: FPNN: Field Probing Neural Networks for 3D Data »
Yangyan Li · Soeren Pirk · Hao Su · Charles R Qi · Leonidas Guibas -
2010 Poster: Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification »
Li-Jia Li · Hao Su · Eric Xing · Li Fei-Fei