NeurIPS Exponential Family Model-Based Reinforcement Learning via Score Matching

Poster
in
Workshop: Deep Reinforcement Learning

Exponential Family Model-Based Reinforcement Learning via Score Matching

Gene Li · Junbo Li · Nathan Srebro · Zhaoran Wang · Zhuoran Yang

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: We propose a optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with

$d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression. SMRL achieves

$\tilde O(d\sqrt{H^3T})$ regret, where

$H$ is the length of each episode and

$T$ is the total number of interactions.

Chat is not available.

Poster in Workshop: Deep Reinforcement Learning

Exponential Family Model-Based Reinforcement Learning via Score Matching

Gene Li · Junbo Li · Nathan Srebro · Zhaoran Wang · Zhuoran Yang

Poster
in
Workshop: Deep Reinforcement Learning