Policy Evaluation Using the Ω-Return
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris

Tue Dec 8th 07:00 -- 11:59 PM @ 210 C #51 #None

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.

Author Information

Philip Thomas (University of Massachusetts Amherst, Carnegie Mellon University)
Scott Niekum (UT Austin)
Georgios Theocharous (Adobe)
George Konidaris (Duke)

