NeurIPS Poster Geometric Exploration for Online Control

Poster

Geometric Exploration for Online Control

Orestis Plevrakis · Elad Hazan

Poster Session 5 #1557

[ Abstract ] [ Paper PDF ]

[ Paper ]

Abstract: We study the control of an \emph{unknown} linear dynamical system under general convex costs. The objective is minimizing regret vs the class of strongly-stable linear policies. In this work, we first consider the case of known cost functions, for which we design the first polynomial-time algorithm with

n^{3} \sqrt{T}

$n^3\sqrt{T}$ -regret, where

n

$n$ is the dimension of the state plus the dimension of control input. The

\sqrt{T}

$\sqrt{T}$ -horizon dependence is optimal, and improves upon the previous best known bound of

T^{2 / 3}

$T^{2/3}$ . The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in an over-parameterized policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with

p o l y (n) \sqrt{T}

$poly(n)\sqrt{T}$ -regret, building on Stochastic Bandit Convex Optimization.

Chat is not available.