Timezone: »
Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence. Meta reinforcement learning (meta-RL) tackles this problem by learning transferable policies that support few-shot adaptation to unseen tasks. Despite recent advances in meta-RL, most existing methods require the access to the environmental reward function of new tasks to infer the task objective, which is not realistic in many practical applications. To bridge this gap, we study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning. We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback. The agent can adapt to new tasks by querying human's preference between behavior trajectories instead of using per-step numeric rewards. By extending techniques from information theory, our approach can design query sequences to maximize the information gain from human interactions while tolerating the inherent error of non-expert human oracle. In experiments, we extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a variety of meta-RL benchmark tasks and demonstrate substantial improvement over baseline algorithms in terms of both feedback efficiency and error tolerance.
Author Information
Zhizhou Ren (UIUC)
Anji Liu (University of California, Los Angeles)
Yitao Liang (Peking University)
Jian Peng (University of Illinois at Urbana-Champaign)
Jianzhu Ma (Peking University)
More from the Same Authors
-
2021 Spotlight: Tractable Regularization of Probabilistic Circuits »
Anji Liu · Guy Van den Broeck -
2021 : Imitation Learning from Observations under Transition Model Disparity »
Tanmay Gangwani · Yuan Zhou · Jian Peng -
2021 : Hindsight Foresight Relabeling for Meta-Reinforcement Learning »
Michael Wan · Jian Peng · Tanmay Gangwani -
2022 Poster: Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures »
Shitong Luo · Yufeng Su · Xingang Peng · Sheng Wang · Jian Peng · Jianzhu Ma -
2022 : Neural-Symbolic Recursive Machine for Systematic Generalization »
Qing Li · Yixin Zhu · Yitao Liang · Ying Nian Wu · Song-Chun Zhu · Siyuan Huang -
2023 : Transformer-Based Large Language Models Are Not General Learners: A Universal Circuit Perspective »
Yang Chen · Yitao Liang · Zhouchen Lin -
2023 : GROOT: Learning to Follow Instructions by Watching Gameplay Videos »
Shaofei Cai · Bowei Zhang · Zihao Wang · Xiaojian (Shawn) Ma · Anji Liu · Yitao Liang -
2023 : JARVIS-1: Open-Ended Multi-task Agents with Memory-Augmented Multimodal Language Models »
Zihao Wang · Shaofei Cai · Anji Liu · Xiaojian (Shawn) Ma · Yitao Liang -
2023 : MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft »
Haowei Lin · Zihao Wang · Jianzhu Ma · Yitao Liang -
2023 : GROOT: Learning to Follow Instructions by Watching Gameplay Videos »
Shaofei Cai · Bowei Zhang · Zihao Wang · Xiaojian (Shawn) Ma · Anji Liu · Yitao Liang -
2023 : GROOT: Learning to Follow Instructions by Watching Gameplay Videos »
Shaofei Cai · Bowei Zhang · Zihao Wang · Xiaojian (Shawn) Ma · Anji Liu · Yitao Liang -
2023 Poster: Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents »
Zihao Wang · Shaofei Cai · Guanzhou Chen · Anji Liu · Xiaojian (Shawn) Ma · Yitao Liang -
2023 Poster: Equivariant Neural Operator Learning with Graphon Convolution »
Chaoran Cheng · Jian Peng -
2023 Poster: LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion »
Jiaqi Guan · Xingang Peng · PeiQi Jiang · Yunan Luo · Jian Peng · Jianzhu Ma -
2022 Poster: Sparse Probabilistic Circuits via Pruning and Growing »
Meihua Dang · Anji Liu · Guy Van den Broeck -
2021 Poster: Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics »
Man Shun Ang · Jianzhu Ma · Nianjun Liu · Kun Huang · Yijie Wang -
2021 Poster: A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference »
Antonio Vergari · YooJung Choi · Anji Liu · Stefano Teso · Guy Van den Broeck -
2021 Oral: A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference »
Antonio Vergari · YooJung Choi · Anji Liu · Stefano Teso · Guy Van den Broeck -
2021 Poster: A 3D Generative Model for Structure-Based Drug Design »
Shitong Luo · Jiaqi Guan · Jianzhu Ma · Jian Peng -
2021 Poster: Tractable Regularization of Probabilistic Circuits »
Anji Liu · Guy Van den Broeck -
2020 Poster: Learning Guidance Rewards with Trajectory-space Smoothing »
Tanmay Gangwani · Yuan Zhou · Jian Peng -
2020 Poster: Off-Policy Interval Estimation with Lipschitz Value Iteration »
Ziyang Tang · Yihao Feng · Na Zhang · Jian Peng · Qiang Liu -
2019 Poster: Thresholding Bandit with Optimal Aggregate Regret »
Chao Tao · Saúl Blanco · Jian Peng · Yuan Zhou -
2019 Poster: Exploration via Hindsight Goal Generation »
Zhizhou Ren · Kefan Dong · Yuan Zhou · Qiang Liu · Jian Peng