Timezone: »

Reinforcement Learning with Feedback Graphs
Christoph Dann · Yishay Mansour · Mehryar Mohri · Ayush Sekhari · Karthik Sridharan

Tue Dec 08 09:00 AM -- 11:00 AM (PST) @ Poster Session 1 #496

We study RL in the tabular MDP setting where the agent receives additional observations per step in the form of transitions samples. Such additional observations can be provided in many tasks by auxiliary sensors or by leveraging prior knowledge about the environment (e.g., when certain actions yield similar outcome). We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can incorporate additional observations for more sample-efficient learning. We give a regret bound that predominantly depends on the size of the maximum acyclic subgraph of the feedback graph, in contrast with a polynomial dependency on the number of states and actions in the absence of side observations. Finally, we highlight fundamental challenges for leveraging a small dominating set of the feedback graph, as compared to the well-studied bandit setting, and propose a new algorithm that can use such a dominating set to learn a near-optimal policy faster.

Author Information

Christoph Dann (Google Research)
Yishay Mansour (Google)
Mehryar Mohri (Google Research & Courant Institute of Mathematical Sciences)
Ayush Sekhari (Cornell University)
Karthik Sridharan (Cornell University)

More from the Same Authors