NeurIPS Poster Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm

Poster

Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm

Amir-massoud Farahmand

East Exhibition Hall B, C #207

Keywords: [ Reinforcement Learning and Planning ] [ Algorithms -> Uncertainty Estimation; Reinforcement Learning and Planning ] [ Markov Decision Processes ]

[ Abstract ]

Abstract:

This paper considers the problem of estimating the distribution of returns in reinforcement learning, i.e., distributional RL problem. It presents a new representational framework to maintain the uncertainty of returns and provides mathematical tools to compute it. We show that instead of representing a probability distribution function of returns, one can represent their characteristic function, the Fourier transform of their distribution. We call the new representation Characteristic Value Function (CVF). The CVF satisfies a Bellman-like equation, and its corresponding Bellman operator is contraction with respect to certain metrics. The contraction property allows us to devise an iterative procedure to compute the CVF, which we call Characteristic Value Iteration (CVI). We analyze CVI and its approximate variant and show how approximation errors affect the quality of the computed CVF.

Live content is unavailable. Log in and register to view live content