Backdoors in Deep Learning: The Good, the Bad, and the Ugly

Workshop

Backdoors in Deep Learning: The Good, the Bad, and the Ugly

Khoa D Doan · Aniruddha Saha · Anh Tran · Yingjie Lao · Kok-Seng Wong · Ang Li · HARIPRIYA HARIKUMAR · Eugene Bagdasaryan · Micah Goldblum · Tom Goldstein

Room 203 - 205

Fri 15 Dec, 7 a.m. PST

[ Abstract ] Workshop Website

Deep neural networks (DNNs) are revolutionizing almost all AI domains and have become the core of many modern AI systems. While having superior performance compared to classical methods, DNNs are also facing new security problems, such as adversarial and backdoor attacks, that are hard to discover and resolve due to their black-box-like property. Backdoor attacks, particularly, are a brand-new threat that was only discovered in 2017 but has gained attention quickly in the research community. The number of backdoor-related papers grew from 21 to around 110 after only one year (2019-2020). In 2022 alone, there were more than 200 papers on backdoor learning, showing a high research interest in this domain.Backdoor attacks are possible because of insecure model pretraining and outsourcing practices. Due to the complexity and the tremendous cost of collecting data and training models, many individuals/companies just employ models or training data from third parties. Malicious third parties can add backdoors into their models or poison their released data before delivering it to the victims to gain illegal benefits. This threat seriously damages the safety and trustworthiness of AI development. Lately, many studies on backdoor attacks and defenses have been conducted to prevent this critical vulnerability.While most works consider backdoor evil'', some studies exploit it for good purposes. A popular approach is to use the backdoor as a watermark to detect illegal use of commercialized data/models. A few works employ the backdoor as a trapdoor for adversarial defense. Learning the working mechanism of backdoor also elevates a deeper understanding of how deep learning models work.This workshop is designed to provide a comprehensive understanding of the current state of backdoor research. We also want to raise awareness of the AI community on this important security problem, and motivate researchers to build safe and trustful AI systems.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 7:00 a.m. - 7:30 a.m.	A Blessing in Disguise: Backdoor Attacks as Watermarks for Dataset Copyright Protection ( Invited Talk ) > SlidesLive Video	Yiming Li 🔗
Fri 7:30 a.m. - 8:00 a.m.	Recent Advances in Backdoor Defense and Benchmark ( Invited Talk ) > SlidesLive Video	Baoyuan Wu 🔗
Fri 8:00 a.m. - 8:30 a.m.	COFFEE BREAK ( COFFEE BREAK ) >	🔗
Fri 8:30 a.m. - 9:00 a.m.	Invited Talk ( Invited Talk ) > SlidesLive Video	Jonas Geiping 🔗
Fri 9:00 a.m. - 9:15 a.m.	Effective Backdoor Mitigation Depends on the Pre-training Objective ( Oral ) > link SlidesLive Video Link	Sahil Verma · Gantavya Bhatt · Soumye Singhal · Arnav Das · Chirag Shah · John Dickerson · Jeff A Bilmes 🔗
Fri 9:15 a.m. - 9:45 a.m.	Universal jailbreak backdoors from poisoned human feedback ( Invited Talk ) > SlidesLive Video	Florian Tramer 🔗
Fri 9:45 a.m. - 11:00 a.m.	LUNCH BREAK ( LUNCH BREAK ) >	🔗
Fri 11:00 a.m. - 11:15 a.m.	VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models ( Oral ) > link SlidesLive Video Link	Sheng-Yen Chou · Pin-Yu Chen · Tsung-Yi Ho 🔗
Fri 11:15 a.m. - 11:30 a.m.	The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline ( Oral ) > link Link	Haonan Wang · Qianli Shen · Yao Tong · Yang Zhang · Kenji Kawaguchi 🔗
Fri 11:30 a.m. - 12:00 p.m.	Is this model mine? On stealing and defending machine learning models. ( Invited Talk ) > SlidesLive Video	Adam Dziedzic 🔗
Fri 12:00 p.m. - 12:30 p.m.	Invited Talk ( Invited Talk ) > SlidesLive Video	Ruoxi Jia 🔗
Fri 12:30 p.m. - 1:00 p.m.	COFFEE BREAK ( COFFEE BREAK ) >	🔗
Fri 1:00 p.m. - 1:45 p.m.	On the Limitation of Backdoor Detection Methods ( Poster ) > link Link	Georg Pichler · Marco Romanelli · Divya Prakash Manivannan · Prashanth Krishnamurthy · Farshad Khorrami · Siddharth Garg 🔗
Fri 1:00 p.m. - 1:45 p.m.	How to remove backdoors in diffusion models? ( Poster ) > link Link	11 presenters Shengwei An · Sheng-Yen Chou · Kaiyuan Zhang · Qiuling Xu · Guanhong Tao · Guangyu Shen · Siyuan Cheng · Shiqing Ma · Pin-Yu Chen · Tsung-Yi Ho · Xiangyu Zhang 🔗
Fri 1:00 p.m. - 1:45 p.m.	Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning ( Poster ) > link Link	Taejin Kim · Jiarui Li · Nikhil Madaan · Shubhranshu Singh · Carlee Joe-Wong 🔗
Fri 1:00 p.m. - 1:45 p.m.	How to Backdoor HyperNetwork in Personalized Federated Learning? ( Poster ) > link Link	Phung Lai · Hai Phan · Issa Khalil · Abdallah Khreishah · Xintao Wu 🔗
Fri 1:00 p.m. - 1:45 p.m.	Universal Trojan Signatures in Reinforcement Learning ( Poster ) > link Link	Manoj Acharya · Weichao Zhou · Anirban Roy · Xiao Lin · Wenchao Li · Susmit Jha 🔗
Fri 1:00 p.m. - 1:45 p.m.	Analyzing And Editing Inner Mechanisms of Backdoored Language Models ( Poster ) > link Link	Max Lamparth · Ann-Katrin Reuel 🔗
Fri 1:00 p.m. - 1:45 p.m.	Detecting Backdoors with Meta-Models ( Poster ) > link Link	Lauro Langosco · Neel Alex · William Baker · David Quarel · Herbie Bradley · David Krueger 🔗
Fri 1:00 p.m. - 1:45 p.m.	Benchmark Probing: Investigating Data Leakage in Large Language Models ( Poster ) > link Link	Chunyuan Deng · Yilun Zhao · Xiangru Tang · Mark Gerstein · Arman Cohan 🔗
Fri 1:00 p.m. - 1:45 p.m.	Leveraging Diffusion-Based Image Variations for Robust Training on Poisoned Data ( Poster ) > link Link	Lukas Struppek · Martin Bernhard Hentschel · Clifton Poth · Dominik Hintersdorf · Kristian Kersting 🔗
Fri 1:00 p.m. - 1:45 p.m.	$D^3$ : Detoxing Deep Learning Dataset ( Poster ) > link Link	Lu Yan · Siyuan Cheng · Guangyu Shen · Guanhong Tao · Xuan Chen · Kaiyuan Zhang · Yunshu Mao · Xiangyu Zhang 🔗
Fri 1:00 p.m. - 1:45 p.m.	Defending Our Privacy With Backdoors ( Poster ) > link Link	Dominik Hintersdorf · Lukas Struppek · Daniel Neider · Kristian Kersting 🔗
Fri 1:00 p.m. - 1:45 p.m.	Clean-label Backdoor Attacks by Selectively Poisoning with Limited Information from Target Class ( Poster ) > link Link	Nguyen Hung-Quang · Ngoc-Hieu Nguyen · The Anh Ta · Thanh Nguyen-Tang · Hoang Thanh-Tung · Khoa D Doan 🔗
Fri 1:00 p.m. - 1:45 p.m.	BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection ( Poster ) > link Link	Saket Sanjeev Chaturvedi · Lan Zhang · Wenbin Zhang · Pan He · Xiaoyong Yuan 🔗
Fri 1:00 p.m. - 1:45 p.m.	Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks ( Poster ) > link Link	Shuli Jiang · Swanand Kadhe · Yi Zhou · Ling Cai · Nathalie Baracaldo 🔗
Fri 1:00 p.m. - 1:45 p.m.	From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models ( Poster ) > link Link	Zhuoshi Pan · Yuguang Yao · Gaowen Liu · Bingquan Shen · H. Vicky Zhao · Ramana Kompella · Sijia Liu 🔗
Fri 1:45 p.m. - 2:00 p.m.	Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection ( Oral ) > link SlidesLive Video Link	Jun Yan · Vikas Yadav · Shiyang Li · Lichang Chen · Zheng Tang · Hai Wang · Vijay Srinivasan · Xiang Ren · Hongxia Jin 🔗
Fri 2:00 p.m. - 2:15 p.m.	BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models ( Oral ) > link SlidesLive Video Link	Zhen Xiang · Fengqing Jiang · Zidi Xiong · Bhaskar Ramasubramanian · Radha Poovendran · Bo Li 🔗
Fri 2:15 p.m. - 2:45 p.m.	Decoding Backdoors in LLMs and Their Implications ( Invited Talk ) > SlidesLive Video	Bo Li 🔗
Fri 2:45 p.m. -	PANEL DISCUSSION ( PANEL DISCUSSION ) > SlidesLive Video	🔗