Poster
Reinforcement Learning with Convex Constraints
Sobhan Miryoosefi · Kianté Brantley · Hal Daumé III · Miro Dudik · Robert Schapire

Thu Dec 12th 10:45 AM -- 12:45 PM @ East Exhibition Hall B + C #232

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks: specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms do not incorporate, such as diversity.

Author Information

Sobhan Miryoosefi (Princeton University)
Kianté Brantley (The University of Maryland College Park)
Hal Daumé III (Microsoft Research & University of Maryland)

Hal Daumé III wields a professor appointment in Computer Science and Language Science at the University of Maryland, and spends time as a principal researcher in the machine learning group and fairness group at Microsoft Research in New York City. He and his wonderful advisees study questions related to how to get machines to become more adept at human language, by developing models and algorithms that allow them to learn from data. The two major questions that really drive their research these days are: (1) how can we get computers to learn language through natural interaction with people/users? and (2) how can we do this in a way that promotes fairness, transparency and explainability in the learned models?

Miro Dudik (Microsoft Research)
Robert Schapire (MIcrosoft Research)

More from the Same Authors