Timezone: »

Dynamic Bottleneck for Robust Self-Supervised Exploration
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @ Virtual #None

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

Author Information

Chenjia Bai (Harbin Institute of Technology)
Lingxiao Wang (Northwestern University)
Lei Han (Tencent AI Lab)
Animesh Garg (University of Toronto, Vector Institute)

I am a Assistant Professor of Computer Science at University of Toronto and a Faculty Member at the Vector Institute. I work on machine learning for perception and control in robotics.

Jianye Hao (Tianjin University)
Peng Liu (Harbin Institute of Technology)
Zhaoran Wang (Princeton University)

More from the Same Authors