NeurIPS 2020 : Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes



Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Yi Tian, Jian Qian, Suvrit Sra

Spotlight presentation: Orals & Spotlights Track 09: Reinforcement Learning
on Tue, Dec 8th, 2020 @ 15:30 – 15:40 GMT

Poster Session 2 (more posters)
on Tue, Dec 8th, 2020 @ 17:00 – 19:00 GMT

Toggle Abstract Paper (in Proceedings / .pdf)

Abstract: We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Yi Tian, Jian Qian, Suvrit Sra

Preview Video and Chat

To see video, interact with the author and ask questions please use registration and login.