NeurIPS Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

Poster
in
Workshop: Learning and Decision-Making with Strategic Feedback (StratML)

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

Amin Rakhsha · Xuezhou Zhang · Jerry Zhu · Adish Singla

[ Abstract ]

Abstract:

We study black-box reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a sequence of RL agents with unknown algorithms to learn a nefarious policy in an environment unknown to the adversary a priori. That is, our attack makes minimum assumptions on the prior knowledge of the adversary: it has no initial knowledge of the environment or the learner, and neither does it observe the learner's internal mechanism except for its performed actions. We design a novel black-box attack, U2, that can provably achieve a near-matching performance to the state-of-the-art white-box attack, demonstrating the feasibility of reward poisoning even in the most challenging black-box setting.

Chat is not available.

Poster in Workshop: Learning and Decision-Making with Strategic Feedback (StratML)

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

Amin Rakhsha · Xuezhou Zhang · Jerry Zhu · Adish Singla

Poster
in
Workshop: Learning and Decision-Making with Strategic Feedback (StratML)