Timezone: »
Much of the recent successes of deep learning can be attributed to scaling up the size of the networks to the point where they often are vastly overparameterized. Thus, understanding the role of overparameterization is of increasing importance. While predictive theories have been developed for supervised learning, little is known about the Reinforcement Learning case. In this work, we take a theoretical approach and study the role of overparameterization for off-policy Temporal Difference (TD) learning in the linear setting. We leverage tools from Random Matrix Theory and random graph theory to obtain a characterization of the spectrum of the TD operator. We use this result to study the stability and optimization dynamics of TD learning as a function of the number of parameters.
Author Information
Valentin Thomas (Mila)
More from the Same Authors
-
2021 : Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization »
Alexandre Piche · Joseph Marino · Gian Maria Marconi · Valentin Thomas · Chris Pal · Mohammad Emtiyaz Khan -
2022 Poster: The Role of Baselines in Policy Gradient Optimization »
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2017 : Poster session + Coffee break »
Mikael Kågebäck · Igor Melnyk · Amir-Hossein Karimi · Gino Brunner · Ershad Banijamali · Chris Donahue · Jake Zhao · Giambattista Parascandolo · Valentin Thomas · Abhishek Kumar · Chris Burgess · Amanda Nilsson · Maria Larsson · Cian Eastwood · Momchil Peychev