Skip to yearly menu bar Skip to main content


Planning with General Objective Functions: Going Beyond Total Rewards

Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang

Poster Session 2 #600

Keywords: [ Learning Theory ] [ Theory ] [ Algorithms -> Classification; Theory ] [ Frequentist Statistics ]

Abstract: Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i.e., maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon. However, this paradigm fails to model important practical applications, e.g., safe control that aims to maximize the lowest reward, i.e., maximize $\min_{h= 1}^H r_h$. In this paper, based on techniques in sketching algorithms, we propose a novel planning algorithm in deterministic systems which deals with a large class of objective functions of the form $f(r_1, r_2, ... r_H)$ that are of interest to practical applications. We show that efficient planning is possible if $f$ is symmetric under permutation of coordinates and satisfies certain technical conditions. Complementing our algorithm, we further prove that removing any of the conditions will make the problem intractable in the worst case and thus demonstrate the necessity of our conditions.

Chat is not available.