Timezone: »

Hierarchical Reinforcement Learning
Andrew G Barto · Doina Precup · Shie Mannor · Tom Schaul · Roy Fox · Carlos Florensa

Sat Dec 09 08:00 AM -- 06:30 PM (PST) @ Grand Ballroom A
Event URL: https://sites.google.com/view/hrlnips2017 »

Reinforcement Learning (RL) has become a powerful tool for tackling complex sequential decision-making problems. It has been shown to train agents to reach super-human capabilities in game-playing domains such as Go and Atari. RL can also learn advanced control policies in high-dimensional robotic systems. Nevertheless, current RL agents have considerable difficulties when facing sparse rewards, long planning horizons, and more generally a scarcity of useful supervision signals. Unfortunately, the most valuable control tasks are specified in terms of high-level instructions, implying sparse rewards when formulated as an RL problem. Internal spatio-temporal abstractions and memory structures can constrain the decision space, improving data efficiency in the face of scarcity, but are likewise challenging for a supervisor to teach.

Hierarchical Reinforcement Learning (HRL) is emerging as a key component for finding spatio-temporal abstractions and behavioral patterns that can guide the discovery of useful large-scale control architectures, both for deep-network representations and for analytic and optimal-control methods. HRL has the potential to accelerate planning and exploration by identifying skills that can reliably reach desirable future states. It can abstract away the details of low-level controllers to facilitate long-horizon planning and meta-learning in a high-level feature space. Hierarchical structures are modular and amenable to separation of training efforts, reuse, and transfer. By imitating a core principle of human cognition, hierarchies hold promise for interpretability and explainability.

There is a growing interest in HRL methods for structure discovery, planning, and learning, as well as HRL systems for shared learning and policy deployment. The goal of this workshop is to improve cohesion and synergy among the research community and increase its impact by promoting better understanding of the challenges and potential of HRL. This workshop further aims to bring together researchers studying both theoretical and practical aspects of HRL, for a joint presentation, discussion, and evaluation of some of the numerous novel approaches to HRL developed in recent years.

Sat 9:00 a.m. - 9:10 a.m.
Opening Remarks
Roy Fox
Sat 9:10 a.m. - 9:40 a.m.
Deep Reinforcement Learning with Subgoals (David Silver) (Invited Talk)
David Silver
Sat 9:40 a.m. - 9:50 a.m.
Landmark Options Via Reflection (LOVR) in Multi-task Lifelong Reinforcement Learning (Nicholas Denis) (Contributed Talk)
Nicholas Denis
Sat 9:50 a.m. - 10:00 a.m.
Crossmodal Attentive Skill Learner (Shayegan Omidshafiei) (Contributed Talk)
Shayegan Omidshafiei
Sat 10:00 a.m. - 10:30 a.m.
HRL with gradient-based subgoal generators, asymptotically optimal incremental problem solvers, various meta-learners, and PowerPlay (Jürgen Schmidhuber) (Invited Talk)
Jürgen Schmidhuber
Sat 11:00 a.m. - 11:30 a.m.
Meta-Learning Shared Hierarchies (Pieter Abbeel) (Invited Talk)
Pieter Abbeel
Sat 11:30 a.m. - 11:55 a.m.
Best Paper Award and Talk — Learning with options that terminate off-policy (Anna Harutyunyan) (Contributed Talk)
Anna Harutyunyan
Sat 11:55 a.m. - 12:30 p.m.
Spotlights & Poster Session (Poster Session)
Dave Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang
Sat 12:30 p.m. - 1:30 p.m.
Lunch Break (Break)
Sat 1:30 p.m. - 2:00 p.m.
Hierarchical Imitation and Reinforcement Learning for Robotics (Jan Peters) (Invited Talk)
Jan Peters
Sat 2:00 p.m. - 2:10 p.m.
Deep Abstract Q-Networks (Melrose Roderick) (Contributed Talk)
Melrose Roderick
Sat 2:10 p.m. - 2:20 p.m.
Federated Control with Hierarchical Multi-Agent Deep Reinforcement Learning (Saurabh Kumar) (Contributed Talk)
Saurabh Kumar
Sat 2:20 p.m. - 2:30 p.m.
Effective Master-Slave Communication On A Multi-Agent Deep Reinforcement Learning System (Xiangyu Kong) (Contributed Talk)
Xiangyu Kong
Sat 2:30 p.m. - 3:00 p.m.
Sample efficiency and off policy hierarchical RL (Emma Brunskill) (Invited Talk)
Emma Brunskill
Sat 3:00 p.m. - 3:30 p.m.
Coffee Break (Break)
Sat 3:30 p.m. - 4:00 p.m.
Applying variational information bottleneck in hierarchical domains (Matt Botvinick) (Invited Talk)
Matt Botvinick
Sat 4:00 p.m. - 4:30 p.m.
Progress on Deep Reinforcement Learning with Temporal Abstraction (Doina Precup) (Invited Talk)
Doina Precup
Sat 4:30 p.m. - 5:30 p.m.
Panel Discussion
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox
Sat 5:30 p.m. - 6:30 p.m.
Poster Session
Dave Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang

Author Information

Andrew G Barto (University of Massachusetts)
Doina Precup (McGill University / DeepMind Montreal)
Shie Mannor (Technion)
Tom Schaul (DeepMind)
Roy Fox (UC Berkeley)

[Roy Fox](http://roydfox.com/) is a postdoc at UC Berkeley working with [Ion Stoica](http://people.eecs.berkeley.edu/~istoica/) in the Real-Time Intelligent Secure Explainable lab ([RISELab](https://rise.cs.berkeley.edu/)), and with [Ken Goldberg](http://goldberg.berkeley.edu/) in the Laboratory for Automation Science and Engineering ([AUTOLAB](http://autolab.berkeley.edu/)). His research interests include reinforcement learning, dynamical systems, information theory, automation, and the connections between these fields. His current research focuses on automatic discovery of hierarchical control structures in deep reinforcement learning and in imitation learning of robotic tasks. Roy holds a MSc in Computer Science from the [Technion](http://www.cs.technion.ac.il/), under the supervision of [Moshe Tennenholtz](http://iew3.technion.ac.il/Home/Users/Moshet.phtml), and a PhD in Computer Science from the [Hebrew University](http://www.cs.huji.ac.il/), under the supervision of [Naftali Tishby](http://www.cs.huji.ac.il/~tishby/). He was an exchange PhD student with [Larry Abbott](http://www.cs.huji.ac.il/~tishby/) and [Liam Paninski](http://www.stat.columbia.edu/~liam/) at the [Center for Theoretical Neuroscience](http://www.neurotheory.columbia.edu/) at Columbia University, and a research intern at Microsoft Research.

Carlos Florensa (UC Berkeley)

More from the Same Authors