An ODE method approach for proving convergence and stability of Q-learning in Hierarchical Reinforcement Learning
Massimiliano Manenti · Andrea Iannelli
Abstract
Hierarchical Reinforcement Learning promises, among other benefits, better long-term credit assignment and continual learning capabilities, but theoretical guarantees lag behind practice. In this paper, we propose a Feudal Q-learning scheme and ask whether its coupled updates converge and are stable. By leveraging the theory of Stochastic Approximation and the ODE method, we present a theorem stating the convergence and stability properties of Feudal Q-learning. This provides a principled convergence and stability analysis tailored to Feudal RL, using mathematical tools closely related to the theory of dynamical systems.
Chat is not available.
Successful Page Load