Timezone: »

Adaptive Discretization for Model-Based Reinforcement Learning
Sean Sinclair · Tianyu Wang · Gauri Jain · Siddhartha Banerjee · Christina Yu

Tue Dec 08 09:00 AM -- 11:00 AM (PST) @ Poster Session 1 #357

We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm in large (potentially continuous) state-action spaces. Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space. From a theoretical perspective we provide worst-case regret bounds for our algorithm which are competitive compared to the state-of-the-art model-based algorithms. Moreover, our bounds are obtained via a modular proof technique which can potentially extend to incorporate additional structure on the problem.

From an implementation standpoint, our algorithm has much lower storage and computational requirements due to maintaining a more efficient partition of the state and action spaces. We illustrate this via experiments on several canonical control problems, which shows that our algorithm empirically performs significantly better than fixed discretization in terms of both faster convergence and lower memory usage. Interestingly, we observe empirically that while fixed discretization model-based algorithms vastly outperform their model-free counterparts, the two achieve comparable performance with adaptive discretization.

Author Information

Sean Sinclair (Cornell University)

I am a second year PhD student in Operations Research and Information Engineering at Cornell University. I completed a BSc in Mathematics and Computer Science at McGill University where I worked on a project with Tony Humphries. Before returning to graduate school I spent two and a half years teaching mathematics, science, and English in a small community in rural Ghana with the Peace Corps, and after worked at National Life as a financial analyst. In general, I am interested in machine learning, statistics, and differential equations. My current work is on the theoretical underpinnings of reinforcement learning (RL) in metric spaces. These are natural models for systems involving real-time sequential decision making over continuous spaces. To facilitate RL's use on memory-constrained devices there are many challenges. The first is learning an "optimal" discretization - trading off memory requirements and algorithmic performance. The second is learning the metric when it is not clear what metric the problem is optimal to learn in. This balances the two fundamental requirements of implementable RL - approximating the optimal policy and statistical complexity for the number of samples required to learn the near optimal policy.

Tianyu Wang (Duke University)
Gauri Jain (Cornell University)
Siddhartha Banerjee (Cornell University)
Christina Yu (Cornell University)

More from the Same Authors