Timezone: »
Poster
Provable Defense against Backdoor Policies in Reinforcement Learning
Shubham Bharti · Xuezhou Zhang · Adish Singla · Jerry Zhu
We propose a provable defense mechanism against backdoor policies in reinforcement learning under subspace trigger assumption. A backdoor policy is a security threat where an adversary publishes a seemingly well-behaved policy which in fact allows hidden triggers. During deployment, the adversary can modify observed states in a particular way to trigger unexpected actions and harm the agent. We assume the agent does not have the resources to re-train a good policy. Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a `safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment. Our sanitized policy achieves $\epsilon$ approximate optimality in the presence of triggers, provided the number of clean interactions is $O\left(\frac{D}{(1-\gamma)^4 \epsilon^2}\right)$ where $\gamma$ is the discounting factor and $D$ is the dimension of state space. Empirically, we show that our sanitization defense performs well on two Atari game environments.
Author Information
Shubham Bharti (UW Madison)
Xuezhou Zhang (Princeton University)
Adish Singla (MPI-SWS)
Jerry Zhu (University of Wisconsin-Madison)
More from the Same Authors
-
2021 : Game Redesign in No-regret Game Playing »
Yuzhe Ma · Young Wu · Jerry Zhu -
2021 : Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments »
Amin Rakhsha · Xuezhou Zhang · Jerry Zhu · Adish Singla -
2021 : Poster: Fair Clustering Using Antidote Data »
Anshuman Chhabra · Adish Singla · Prasant Mohapatra -
2021 : Reinforcement Learning Under Algorithmic Triage »
Eleni Straitouri · Adish Singla · Vahid Balazadeh Meresht · Manuel Rodriguez -
2021 : Game Redesign in No-regret Game Playing »
Yuzhe Ma · Young Wu · Jerry Zhu -
2021 : Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments »
Amin Rakhsha · Xuezhou Zhang · Jerry Zhu · Adish Singla -
2022 Poster: On Batch Teaching with Sample Complexity Bounded by VCD »
Farnam Mansouri · Hans Simon · Adish Singla · Sandra Zilles -
2022 : Provable Benefits of Representational Transfer in Reinforcement Learning »
Alekh Agarwal · Yuda Song · Kaiwen Wang · Mengdi Wang · Wen Sun · Xuezhou Zhang -
2023 Poster: Mechanism Design for Collaborative Normal Mean Estimation »
Yiding Chen · Jerry Zhu · Kirthevasan Kandasamy -
2023 Poster: Dream the Impossible: Outlier Imagination with Diffusion Models »
Xuefeng Du · Yiyou Sun · Jerry Zhu · Yixuan Li -
2023 Poster: Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback »
Canzhe Zhao · Ruofeng Yang · Baoxiang Wang · Xuezhou Zhang · Shuai Li -
2023 Workshop: Generative AI for Education (GAIED): Advances, Opportunities, and Challenges »
Paul Denny · Sumit Gulwani · Neil Heffernan · Tanja Käser · Steven Moore · Anna Rafferty · Adish Singla -
2022 Spotlight: On Batch Teaching with Sample Complexity Bounded by VCD »
Farnam Mansouri · Hans Simon · Adish Singla · Sandra Zilles -
2022 Poster: Envy-free Policy Teaching to Multiple Agents »
Jiarui Gan · R Majumdar · Adish Singla · Goran Radanovic -
2022 Poster: Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards »
Rati Devidze · Parameswaran Kamalaruban · Adish Singla -
2022 Poster: Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks »
Shuoguang Yang · Xuezhou Zhang · Mengdi Wang -
2022 Poster: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization »
Hui Yuan · Chengzhuo Ni · Huazheng Wang · Xuezhou Zhang · Le Cong · Csaba Szepesvari · Mengdi Wang -
2021 : Fair Clustering Using Antidote Data »
Anshuman Chhabra · Adish Singla · Prasant Mohapatra -
2021 : Fairness Degrading Adversarial Attacks Against Clustering Algorithms »
Anshuman Chhabra · Adish Singla · Prasant Mohapatra -
2021 Poster: Curriculum Design for Teaching via Demonstrations: Theory and Applications »
Gaurav Yengera · Rati Devidze · Parameswaran Kamalaruban · Adish Singla -
2021 Poster: Explicable Reward Design for Reinforcement Learning Agents »
Rati Devidze · Goran Radanovic · Parameswaran Kamalaruban · Adish Singla -
2021 Poster: On Blame Attribution for Accountable Multi-Agent Sequential Decision Making »
Stelios Triantafyllou · Adish Singla · Goran Radanovic -
2021 Poster: Teaching an Active Learner with Contrastive Examples »
Chaoqi Wang · Adish Singla · Yuxin Chen -
2021 Poster: Teaching via Best-Case Counterexamples in the Learning-with-Equivalence-Queries Paradigm »
Akash Kumar · Yuxin Chen · Adish Singla -
2020 Poster: Synthesizing Tasks for Block-based Programming »
Umair Ahmed · Maria Christakis · Aleksandr Efremov · Nigel Fernandez · Ahana Ghosh · Abhik Roychoudhury · Adish Singla -
2020 Poster: Task-agnostic Exploration in Reinforcement Learning »
Xuezhou Zhang · Yuzhe Ma · Adish Singla -
2019 Poster: Policy Poisoning in Batch Reinforcement Learning and Control »
Yuzhe Ma · Xuezhou Zhang · Wen Sun · Jerry Zhu -
2019 Poster: Teaching Multiple Concepts to a Forgetful Learner »
Anette Hunziker · Yuxin Chen · Oisin Mac Aodha · Manuel Gomez Rodriguez · Andreas Krause · Pietro Perona · Yisong Yue · Adish Singla -
2019 Poster: Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models »
Farnam Mansouri · Yuxin Chen · Ara Vartanian · Jerry Zhu · Adish Singla -
2019 Poster: A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning »
Xuanqing Liu · Si Si · Jerry Zhu · Yang Li · Cho-Jui Hsieh -
2019 Poster: Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints »
Sebastian Tschiatschek · Ahana Ghosh · Luis Haug · Rati Devidze · Adish Singla -
2018 : Assisted Inverse Reinforcement Learning »
Adish Singla · Rati Devidze -
2018 Poster: Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners »
Yuxin Chen · Adish Singla · Oisin Mac Aodha · Pietro Perona · Yisong Yue -
2018 Poster: Teaching Inverse Reinforcement Learners via Features and Demonstrations »
Luis Haug · Sebastian Tschiatschek · Adish Singla -
2018 Poster: Enhancing the Accuracy and Fairness of Human Decision Making »
Isabel Valera · Adish Singla · Manuel Gomez Rodriguez -
2018 Poster: Adversarial Attacks on Stochastic Bandits »
Kwang-Sung Jun · Lihong Li · Yuzhe Ma · Jerry Zhu -
2017 Workshop: Teaching Machines, Robots, and Humans »
Maya Cakmak · Anna Rafferty · Adish Singla · Jerry Zhu · Sandra Zilles -
2016 : Optimal Teaching for Online Perceptrons »
Xuezhou Zhang · Jerry Zhu -
2016 Workshop: The Future of Interactive Machine Learning »
Kory Mathewson @korymath · Kaushik Subramanian · Mark Ho · Robert Loftin · Joseph L Austerweil · Anna Harutyunyan · Doina Precup · Layla El Asri · Matthew Gombolay · Jerry Zhu · Sonia Chernova · Charles Isbell · Patrick M Pilarski · Weng-Keen Wong · Manuela Veloso · Julie A Shah · Matthew Taylor · Brenna Argall · Michael Littman -
2016 Poster: Active Learning with Oracle Epiphany »
Tzu-Kuo Huang · Lihong Li · Ara Vartanian · Saleema Amershi · Jerry Zhu -
2015 Poster: Human Memory Search as Initial-Visit Emitting Random Walk »
Kwang-Sung Jun · Jerry Zhu · Timothy T Rogers · Zhuoran Yang · Ming Yuan -
2014 Poster: Optimal Teaching for Limited-Capacity Human Learners »
Kaustubh R Patil · Jerry Zhu · Łukasz Kopeć · Bradley C Love -
2014 Spotlight: Optimal Teaching for Limited-Capacity Human Learners »
Kaustubh R Patil · Jerry Zhu · Łukasz Kopeć · Bradley C Love -
2013 Poster: Machine Teaching for Bayesian Learners in the Exponential Family »
Jerry Zhu -
2011 Poster: How Do Humans Teach: On Curriculum Learning and Teaching Dimension »
Faisal Khan · Jerry Zhu · Bilge Mutlu -
2011 Poster: Learning Higher-Order Graph Structure with Features by Structure Penalty »
Shilin Ding · Grace Wahba · Jerry Zhu -
2010 Oral: Humans Learn Using Manifolds, Reluctantly »
Bryan R Gibson · Jerry Zhu · Timothy T Rogers · Chuck Kalish · Joseph Harrison -
2010 Poster: Humans Learn Using Manifolds, Reluctantly »
Bryan R Gibson · Jerry Zhu · Timothy T Rogers · Chuck Kalish · Joseph Harrison -
2010 Poster: Transduction with Matrix Completion: Three Birds with One Stone »
Andrew B Goldberg · Jerry Zhu · Benjamin Recht · Junming Sui · Rob Nowak -
2010 Session: Spotlights Session 1 »
Jerry Zhu -
2009 Poster: Human Rademacher Complexity »
Jerry Zhu · Timothy T Rogers · Bryan R Gibson -
2008 Workshop: Machine learning meets human learning »
Nathaniel D Daw · Tom Griffiths · Josh Tenenbaum · Jerry Zhu -
2008 Poster: Human Active Learning »
Jerry Zhu · Rui M Castro · Timothy T Rogers · Rob Nowak · Ruichen Qian · Chuck Kalish -
2008 Poster: Unlabeled data: Now it helps, now it doesn't »
Aarti Singh · Rob Nowak · Jerry Zhu -
2008 Oral: Unlabeled data: Now it helps, now it doesn't »
Aarti Singh · Rob Nowak · Jerry Zhu