NeurIPS Poster Causality-driven Hierarchical Structure Discovery for Reinforcement Learning

Poster

Causality-driven Hierarchical Structure Discovery for Reinforcement Learning

shaohui peng · Xing Hu · Rui Zhang · Ke Tang · Jiaming Guo · Qi Yi · Ruizhi Chen · xishan zhang · Zidong Du · Ling Li · Qi Guo · Yunji Chen

Hall J (level 1) #1042

Keywords: [ causal discovery ] [ subgoal ] [ causalty ] [ hierarchical reinforcement learning ]

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract:

Hierarchical reinforcement learning (HRL) has been proven to be effective for tasks with sparse rewards, for it can improve the agent's exploration efficiency by discovering high-quality hierarchical structures (e.g., subgoals or options). However, automatically discovering high-quality hierarchical structures is still a great challenge. Previous HRL methods can only find the hierarchical structures in simple environments, as they are mainly achieved through the randomness of agent's policies during exploration. In complicated environments, such a randomness-driven exploration paradigm can hardly discover high-quality hierarchical structures because of the low exploration efficiency. In this paper, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, to build high-quality hierarchical structures efficiently in complicated environments. The key insight is that the causalities among environment variables are naturally fit for modeling reachable subgoals and their dependencies; thus, the causality is suitable to be the guidance in building high-quality hierarchical structures. Roughly, we build the hierarchy of subgoals based on causality autonomously, and utilize the subgoal-based policies to unfold further causality efficiently. Therefore, CDHRL leverages a causality-driven discovery instead of a randomness-driven exploration for high-quality hierarchical structure construction. The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL can discover high-quality hierarchical structures and significantly enhance exploration efficiency.

Chat is not available.