NeurIPS Execute Order 66: Targeted Data Poisoning for Reinforcement Learning via Minuscule Perturbations

Poster
in
Workshop: Safe and Robust Control of Uncertain Systems

Execute Order 66: Targeted Data Poisoning for Reinforcement Learning via Minuscule Perturbations

Harrison Foley · Liam Fowl · Tom Goldstein · Gavin Taylor

[ Abstract ]

Abstract:

Data poisoning for reinforcement learning has historically focused on general performance degradation, and targeted attacks have been successful via perturbations that involve control of the victim's policy and rewards. We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states - all while minimally modifying a small fraction of training observations without assuming any control over policy or reward. We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning. We test our method and demonstrate success in two Atari games of varying difficulty.

Poster in Workshop: Safe and Robust Control of Uncertain Systems

Execute Order 66: Targeted Data Poisoning for Reinforcement Learning via Minuscule Perturbations

Harrison Foley · Liam Fowl · Tom Goldstein · Gavin Taylor

Poster
in
Workshop: Safe and Robust Control of Uncertain Systems