Timezone: »
In problem-solving, we humans tend to come up with different novel solutions to the same problem. However, conventional reinforcement learning algorithms ignore such a feat and only aim at producing a set of monotonous policies that maximize the cumulative reward. The resulting policies usually lack diversity and novelty. In this work, we aim at enabling the learning algorithms with the capacity of solving the task with multiple solutions through a practical novel policy generation workflow that can generate a set of diverse and well-performing policies. Specifically, we begin by introducing a new metric to evaluate the difference between policies. On top of this well-defined novelty metric, we propose to rethink the novelty-seeking problem through the lens of constrained optimization, to address the dilemma between the task performance and the behavioral novelty in existing multi-objective optimization approaches, we then propose a practical novel policy-seeking algorithm, Interior Policy Differentiation (IPD), which is derived from the interior point method commonly known in the constrained optimization literature. Experimental comparisons on benchmark environments show IPD can achieve a substantial improvement over previous novelty-seeking methods in terms of both novelties of generated policies and their performances in the primal task.
Author Information
Hao Sun (University of Cambridge)
Zhenghao Peng (University of California, Los Angeles)
Bolei Zhou (UCLA)

Assistant professor at UCLA's computer science department
More from the Same Authors
-
2022 : ChemSpacE: Interpretable and Interactive Chemical Space Exploration »
Yuanqi Du · Xian Liu · Nilay Shah · Shengchao Liu · Jieyu Zhang · Bolei Zhou -
2022 : Constrained MDPs can be Solved by Eearly-Termination with Recurrent Models »
Hao Sun · Ziping Xu · Meng Fang · Zhenghao Peng · Taiyi Wang · Bolei Zhou -
2022 : Supervised Q-Learning can be a Strong Baseline for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : GraphCG: Unsupervised Discovery of Steerable Factors in Graphs »
Shengchao Liu · Chengpeng Wang · Weili Nie · Hanchen Wang · Jiarui Lu · Bolei Zhou · Jian Tang -
2022 : Supervised Q-Learning for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : MOPA: a Minimalist Off-Policy Approach to Safe-RL »
Hao Sun · Ziping Xu · Zhenghao Peng · Meng Fang · Bo Dai · Bolei Zhou -
2022 : Toward Causal-Aware RL: State-Wise Action-Refined Temporal Difference »
Hao Sun · Taiyi Wang -
2023 Poster: Learning from Active Human Involvement through Proxy Value Propagation »
Zhenghao Peng · Wenjie Mo · Chenda Duan · Quanyi Li · Bolei Zhou -
2022 : Toward Generalizable Embodied AI for Autonomous Driving »
Bolei Zhou -
2022 Poster: Human-AI Shared Control via Policy Dissection »
Quanyi Li · Zhenghao Peng · Haibin Wu · Lan Feng · Bolei Zhou -
2022 Poster: Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping »
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou -
2022 Poster: Improving GANs with A Dynamic Discriminator »
Ceyuan Yang · Yujun Shen · Yinghao Xu · Deli Zhao · Bo Dai · Bolei Zhou -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Policy Continuation with Hindsight Inverse Dynamics »
Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin -
2019 Spotlight: Policy Continuation with Hindsight Inverse Dynamics »
Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin