Timezone: »
The Trojan Detection Challenge (LLM Edition) aims to advance the understanding and development of methods for detecting hidden functionality in large language models (LLMs). The competition features two main tracks: the Trojan Detection Track and the Red Teaming Track. In the Trojan Detection Track, participants are given a large language model containing thousands of trojans and tasked with discovering the triggers for these trojans. In the Red Teaming Track, participants are challenged to elicit specific undesirable behaviors from a large language model fine-tuned to avoid those behaviors. TDC 2023 will include Base Model and Large Model subtracks to enable broader participation, and established trojan detection and red teaming baselines will be provided as a starting point. By uniting trojan detection and red teaming, TDC 2023 aims to foster collaboration between these communities to promote research on hidden functionality in LLMs and enhance the robustness and security of AI systems.
Sat 11:30 a.m. - 12:00 p.m.
|
Introduction, competition summary, and what we learned
(
Intro / Talk
)
|
🔗 |
Sat 12:00 p.m. - 12:30 p.m.
|
Invited talk 1
(
Talk
)
|
🔗 |
Sat 12:30 p.m. - 12:40 p.m.
|
Break
|
🔗 |
Sat 12:40 p.m. - 12:45 p.m.
|
Announcing winners
(
Announcement
)
|
🔗 |
Sat 12:45 p.m. - 1:45 p.m.
|
Description of winning methods
(
Multiple Talks
)
|
🔗 |
Sat 1:45 p.m. - 1:55 p.m.
|
Break
|
🔗 |
Sat 1:55 p.m. - 2:25 p.m.
|
Invited talk 2
(
Talk
)
|
🔗 |
Sat 2:25 p.m. - 2:30 p.m.
|
Wrapping up
(
Outro
)
|
🔗 |
Author Information
Mantas Mazeika (University of Illinois Urbana-Champaign)
Andy Zou (CMU)
Norman Mu (UC Berkeley)
Long Phan (Center for AI Safety)
Zifan Wang (Center for AI Safety)
Chunru Yu (UIUC)
Adam Khoja (UC Berkeley)
Fengqing Jiang (University of Washington)
Aidan O'Gara (USC)
Zhen Xiang (University of Illinois Urbana-Champaign)
Arezoo Rajabi (UW)
Dan Hendrycks (Center for AI Safety)
Radha Poovendran (University of Washington, Seattle)
Bo Li (UChicago/UIUC)
David Forsyth (University of Illinois at Urbana-Champaign)
More from the Same Authors
-
2021 : CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review »
Dan Hendrycks · Collin Burns · Anya Chen · Spencer Ball -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 : Measuring Coding Challenge Competence With APPS »
Dan Hendrycks · Steven Basart · Saurav Kadavath · Mantas Mazeika · Akul Arora · Ethan Guo · Collin Burns · Samir Puranik · Horace He · Dawn Song · Jacob Steinhardt -
2021 : Certified Robustness for Free in Differentially Private Federated Learning »
Chulin Xie · Yunhui Long · Pin-Yu Chen · Krishnaram Kenthapadi · Bo Li -
2021 : RVFR: Robust Vertical Federated Learning via Feature Subspace Recovery »
Jing Liu · Chulin Xie · Krishnaram Kenthapadi · Sanmi Koyejo · Bo Li -
2021 : PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures »
Dan Hendrycks · Andy Zou · Mantas Mazeika · Leonard Tang · Dawn Song · Jacob Steinhardt -
2021 : What Would Jiminy Cricket Do? Towards Agents That Behave Morally »
Dan Hendrycks · Mantas Mazeika · Andy Zou · Sahil Patel · Christine Zhu · Jesus Navarro · Dawn Song · Bo Li · Jacob Steinhardt -
2021 : Measuring Mathematical Problem Solving With the MATH Dataset »
Dan Hendrycks · Collin Burns · Saurav Kadavath · Akul Arora · Steven Basart · Eric Tang · Dawn Song · Jacob Steinhardt -
2022 Poster: VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely? »
Jiawei Jiang · Lukas Burkhalter · Fangcheng Fu · Bolin Ding · Bo Du · Anwar Hithnawi · Bo Li · Ce Zhang -
2022 : Improving Vertical Federated Learning by Efficient Communication with ADMM »
Chulin Xie · Pin-Yu Chen · Ce Zhang · Bo Li -
2022 : Benchmarking Robustness under Distribution Shift of Multimodal Image-Text Models »
Jielin Qiu · Yi Zhu · Xingjian Shi · Zhiqiang Tang · DING ZHAO · Bo Li · Mu Li -
2022 : DensePure: Understanding Diffusion Models towards Adversarial Robustness »
Zhongzhu Chen · Kun Jin · Jiongxiao Wang · Weili Nie · Mingyan Liu · Anima Anandkumar · Bo Li · Dawn Song -
2022 : Fifteen-minute Competition Overview Video »
Nathan Drenkow · Raman Arora · Gino Perrotta · Todd Neller · Ryan Gardner · Mykel J Kochenderfer · Jared Markowitz · Corey Lowman · Casey Richardson · Bo Li · Bart Paulhamus · Ashley J Llorens · Andrew Newman -
2022 : On the Robustness of Safe Reinforcement Learning under Observational Perturbations »
ZUXIN LIU · Zijian Guo · Zhepeng Cen · Huan Zhang · Jie Tan · Bo Li · DING ZHAO -
2023 : FOCUS: Fairness via Agent-Awareness for Federated Learning on Heterogeneous Data »
Wenda Chu · Chulin Xie · Boxin Wang · Linyi Li · Lang Yin · Arash Nourian · Han Zhao · Bo Li -
2023 : Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications »
Fengqing Jiang · Zhangchen Xu · Luyao Niu · Boxin Wang · Jinyuan Jia · Bo Li · Radha Poovendran -
2023 : Decoding Backdoors in LLMs and Their Implications »
Bo Li -
2023 : BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models »
Zhen Xiang · Fengqing Jiang · Zidi Xiong · Bhaskar Ramasubramanian · Radha Poovendran · Bo Li -
2023 Poster: FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning »
Jinyuan Jia · Zhuowen Yuan · Dinuka Sahabandu · Luyao Niu · Arezoo Rajabi · Bhaskar Ramasubramanian · Bo Li · Radha Poovendran -
2023 Social: ML Safety Social »
Mantas Mazeika -
2023 Poster: IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI »
Bochuan Cao · Changjiang Li · Ting Wang · Jinyuan Jia · Bo Li · Jinghui Chen -
2023 Poster: DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification »
Mintong Kang · Dawn Song · Bo Li -
2023 Poster: Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand »
Junfeng Guo · Yiming Li · Lixu Wang · Shu-Tao Xia · Heng Huang · Cong Liu · Bo Li -
2023 Poster: CBD: A Certified Backdoor Detector Based on Local Dominant Probability »
Zhen Xiang · Zidi Xiong · Bo Li -
2023 Poster: StyleGAN knows Normal, Depth, Albedo, and More »
Anand Bhattad · Daniel McKee · Derek Hoiem · David Forsyth -
2023 Poster: Incentives in Federated Learning: Equilibria, Dynamics, and Mechanisms for Welfare Maximization »
Aniket Murhekar · Zhuowen Yuan · Bhaskar Ray Chaudhury · Bo Li · Ruta Mehta -
2023 Poster: WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data »
Maurice Weber · Carlo Siebenschuh · Rory Butler · Anton Alexandrov · Valdemar Thanner · Georgios Tsolakis · Haris Jabbar · Ian Foster · Bo Li · Rick Stevens · Ce Zhang -
2023 Poster: DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models »
Boxin Wang · Weixin Chen · Hengzhi Pei · Chulin Xie · Mintong Kang · Chenhui Zhang · Chejian Xu · Zidi Xiong · Ritik Dutta · Rylan Schaeffer · Sang Truong · Simran Arora · Mantas Mazeika · Dan Hendrycks · Zinan Lin · Yu Cheng · Sanmi Koyejo · Dawn Song · Bo Li -
2023 Oral: DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models »
Boxin Wang · Weixin Chen · Hengzhi Pei · Chulin Xie · Mintong Kang · Chenhui Zhang · Chejian Xu · Zidi Xiong · Ritik Dutta · Rylan Schaeffer · Sang Truong · Simran Arora · Mantas Mazeika · Dan Hendrycks · Zinan Lin · Yu Cheng · Sanmi Koyejo · Dawn Song · Bo Li -
2022 : Contributed Talk: DensePure: Understanding Diffusion Models towards Adversarial Robustness »
Zhongzhu Chen · Kun Jin · Jiongxiao Wang · Weili Nie · Mingyan Liu · Anima Anandkumar · Bo Li · Dawn Song -
2022 Workshop: Workshop on Machine Learning Safety »
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini -
2022 Workshop: Trustworthy and Socially Responsible Machine Learning »
Huan Zhang · Linyi Li · Chaowei Xiao · J. Zico Kolter · Anima Anandkumar · Bo Li -
2022 Spotlight: Fairness in Federated Learning via Core-Stability »
Bhaskar Ray Chaudhury · Linyi Li · Mintong Kang · Bo Li · Ruta Mehta -
2022 Competition: The Trojan Detection Challenge »
Mantas Mazeika · Dan Hendrycks · Huichen Li · Xiaojun Xu · Andy Zou · Sidney Hough · Arezoo Rajabi · Dawn Song · Radha Poovendran · Bo Li · David Forsyth -
2022 Spotlight: LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness »
Xiaojun Xu · Linyi Li · Bo Li -
2022 Spotlight: Lightning Talks 5B-1 »
Devansh Arpit · Xiaojun Xu · Zifan Shi · Ivan Skorokhodov · Shayan Shekarforoush · Zhan Tong · Yiqun Wang · Shichong Peng · Linyi Li · Ivan Skorokhodov · Huan Wang · Yibing Song · David Lindell · Yinghao Xu · Seyed Alireza Moazenipourasil · Sergey Tulyakov · Peter Wonka · Yiqun Wang · Ke Li · David Fleet · Yujun Shen · Yingbo Zhou · Bo Li · Jue Wang · Peter Wonka · Marcus Brubaker · Caiming Xiong · Limin Wang · Deli Zhao · Qifeng Chen · Dit-Yan Yeung -
2022 Panel: Panel 4C-1: How Would The… & SCAMPS: Synthetics for… »
Mantas Mazeika · Daniel McDuff -
2022 Competition: Reconnaissance Blind Chess: An Unsolved Challenge for Multi-Agent Decision Making Under Uncertainty »
Ryan Gardner · Gino Perrotta · Corey Lowman · Casey Richardson · Andrew Newman · Jared Markowitz · Nathan Drenkow · Bart Paulhamus · Ashley J Llorens · Todd Neller · Raman Arora · Bo Li · Mykel J Kochenderfer -
2022 Spotlight: Certifying Some Distributional Fairness with Subpopulation Decomposition »
Mintong Kang · Linyi Li · Maurice Weber · Yang Liu · Ce Zhang · Bo Li -
2022 Spotlight: Lightning Talks 1A-4 »
Siwei Wang · Jing Liu · Nianqiao Ju · Shiqian Li · Eloïse Berthier · Muhammad Faaiz Taufiq · Arsene Fansi Tchango · Chen Liang · Chulin Xie · Jordan Awan · Jean-Francois Ton · Ziad Kobeissi · Wenguan Wang · Xinwang Liu · Kewen Wu · Rishab Goel · Jiaxu Miao · Suyuan Liu · Julien Martel · Ruobin Gong · Francis Bach · Chi Zhang · Rob Cornish · Sanmi Koyejo · Zhi Wen · Yee Whye Teh · Yi Yang · Jiaqi Jin · Bo Li · Yixin Zhu · Vinayak Rao · Wenxuan Tu · Gaetan Marceau Caron · Arnaud Doucet · Xinzhong Zhu · Joumana Ghosn · En Zhu -
2022 Spotlight: Lightning Talks 1A-3 »
Kimia Noorbakhsh · Ronan Perry · Qi Lyu · Jiawei Jiang · Christian Toth · Olivier Jeunen · Xin Liu · Yuan Cheng · Lei Li · Manuel Rodriguez · Julius von Kügelgen · Lars Lorch · Nicolas Donati · Lukas Burkhalter · Xiao Fu · Zhongdao Wang · Songtao Feng · Ciarán Gilligan-Lee · Rishabh Mehrotra · Fangcheng Fu · Jing Yang · Bernhard Schölkopf · Ya-Li Li · Christian Knoll · Maks Ovsjanikov · Andreas Krause · Shengjin Wang · Hong Zhang · Mounia Lalmas · Bolin Ding · Bo Du · Yingbin Liang · Franz Pernkopf · Robert Peharz · Anwar Hithnawi · Julius von Kügelgen · Bo Li · Ce Zhang -
2022 Spotlight: VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely? »
Jiawei Jiang · Lukas Burkhalter · Fangcheng Fu · Bolin Ding · Bo Du · Anwar Hithnawi · Bo Li · Ce Zhang -
2022 Spotlight: CoPur: Certifiably Robust Collaborative Inference via Feature Purification »
Jing Liu · Chulin Xie · Sanmi Koyejo · Bo Li -
2022 : Panel »
Pin-Yu Chen · Alex Gittens · Bo Li · Celia Cintas · Hilde Kuehne · Payel Das -
2022 : Trustworthy Machine Learning in Autonomous Driving »
Bo Li -
2022 Workshop: Decentralization and Trustworthy Machine Learning in Web3: Methodologies, Platforms, and Applications »
Jian Lou · Zhiguang Wang · Chejian Xu · Bo Li · Dawn Song -
2022 : Invited Talk #5, Privacy-Preserving Data Synthesis for General Purposes, Bo Li »
Bo Li -
2022 : Fairness Panel »
Freedom Gumedze · Rachel Cummings · Bo Li · Robert Tillman · Edward Choi -
2022 : Trustworthy Federated Learning »
Bo Li -
2022 Poster: Improving Certified Robustness via Statistical Learning with Logical Reasoning »
Zhuolin Yang · Zhikuan Zhao · Boxin Wang · Jiawei Zhang · Linyi Li · Hengzhi Pei · Bojan Karlaš · Ji Liu · Heng Guo · Ce Zhang · Bo Li -
2022 Poster: Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection »
Yiming Li · Yang Bai · Yong Jiang · Yong Yang · Shu-Tao Xia · Bo Li -
2022 Poster: Fairness in Federated Learning via Core-Stability »
Bhaskar Ray Chaudhury · Linyi Li · Mintong Kang · Bo Li · Ruta Mehta -
2022 Poster: Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning »
Wenhao Ding · Haohong Lin · Bo Li · DING ZHAO -
2022 Poster: Certifying Some Distributional Fairness with Subpopulation Decomposition »
Mintong Kang · Linyi Li · Maurice Weber · Yang Liu · Ce Zhang · Bo Li -
2022 Poster: How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios »
Mantas Mazeika · Eric Tang · Andy Zou · Steven Basart · Jun Shern Chan · Dawn Song · David Forsyth · Jacob Steinhardt · Dan Hendrycks -
2022 Poster: LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness »
Xiaojun Xu · Linyi Li · Bo Li -
2022 Poster: CoPur: Certifiably Robust Collaborative Inference via Feature Purification »
Jing Liu · Chulin Xie · Sanmi Koyejo · Bo Li -
2022 Poster: Forecasting Future World Events With Neural Networks »
Andy Zou · Tristan Xiao · Ryan Jia · Joe Kwon · Mantas Mazeika · Richard Li · Dawn Song · Jacob Steinhardt · Owain Evans · Dan Hendrycks -
2022 Poster: Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models »
Boxin Wang · Wei Ping · Chaowei Xiao · Peng Xu · Mostofa Patwary · Mohammad Shoeybi · Bo Li · Anima Anandkumar · Bryan Catanzaro -
2022 Poster: SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles »
Chejian Xu · Wenhao Ding · Weijie Lyu · ZUXIN LIU · Shuai Wang · Yihan He · Hanjiang Hu · DING ZHAO · Bo Li -
2022 Poster: General Cutting Planes for Bound-Propagation-Based Neural Network Verification »
Huan Zhang · Shiqi Wang · Kaidi Xu · Linyi Li · Bo Li · Suman Jana · Cho-Jui Hsieh · J. Zico Kolter -
2022 Poster: OpenOOD: Benchmarking Generalized Out-of-Distribution Detection »
Jingkang Yang · Pengyun Wang · Dejian Zou · Zitang Zhou · Kunyuan Ding · WENXUAN PENG · Haoqi Wang · Guangyao Chen · Bo Li · Yiyou Sun · Xuefeng Du · Kaiyang Zhou · Wayne Zhang · Dan Hendrycks · Yixuan Li · Ziwei Liu -
2021 : Live panel: Perspectives on ImageNet. »
Dawn Song · Ross Wightman · Dan Hendrycks -
2021 : Using ImageNet to Measure Robustness and Uncertainty »
Dawn Song · Dan Hendrycks -
2021 : Career and Life: Panel Discussion - Bo Li, Adriana Romero-Soriano, Devi Parikh, and Emily Denton »
Remi Denton · Devi Parikh · Bo Li · Adriana Romero -
2021 : Live Q&A with Bo Li »
Bo Li -
2021 : Invited talk – Trustworthy Machine Learning via Logic Inference, Bo Li »
Bo Li -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 Poster: G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators »
Yunhui Long · Boxin Wang · Zhuolin Yang · Bhavya Kailkhura · Aston Zhang · Carl Gunter · Bo Li -
2021 Poster: Anti-Backdoor Learning: Training Clean Models on Poisoned Data »
Yige Li · Xixiang Lyu · Nodens Koren · Lingjuan Lyu · Bo Li · Xingjun Ma -
2021 Poster: Adversarial Attack Generation Empowered by Min-Max Optimization »
Jingkang Wang · Tianyun Zhang · Sijia Liu · Pin-Yu Chen · Jiacen Xu · Makan Fardad · Bo Li -
2021 : Reconnaissance Blind Chess + Q&A »
Ryan Gardner · Gino Perrotta · Corey Lowman · Casey Richardson · Andrew Newman · Jared Markowitz · Nathan Drenkow · Bart Paulhamus · Ashley J Llorens · Todd Neller · Raman Arora · Bo Li · Mykel J Kochenderfer -
2021 : VisDA21: Visual Domain Adaptation + Q&A »
Kate Saenko · Kuniaki Saito · Donghyun Kim · Samarth Mishra · Ben Usman · Piotr Teterwak · Dina Bashkirova · Dan Hendrycks -
2021 Poster: TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness »
Zhuolin Yang · Linyi Li · Xiaojun Xu · Shiliang Zuo · Qian Chen · Pan Zhou · Benjamin Rubinstein · Ce Zhang · Bo Li -
2020 Workshop: Workshop on Dataset Curation and Security »
Nathalie Baracaldo · Yonatan Bisk · Avrim Blum · Michael Curry · John Dickerson · Micah Goldblum · Tom Goldstein · Bo Li · Avi Schwarzschild -
2020 Poster: Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations »
Huan Zhang · Hongge Chen · Chaowei Xiao · Bo Li · Mingyan Liu · Duane Boning · Cho-Jui Hsieh -
2020 Spotlight: Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations »
Huan Zhang · Hongge Chen · Chaowei Xiao · Bo Li · Mingyan Liu · Duane Boning · Cho-Jui Hsieh -
2020 Poster: On Convergence of Nearest Neighbor Classifiers over Feature Transformations »
Luka Rimanic · Cedric Renggli · Bo Li · Ce Zhang -
2019 Poster: Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty »
Dan Hendrycks · Mantas Mazeika · Saurav Kadavath · Dawn Song -
2018 : Posters (all accepted papers) + Break »
Jianyu Wang · Denis Gudovskiy · Ziheng Jiang · Michael Kaufmann · Andreea Anghel · James Bradbury · Nikolas Ioannou · Nitin Agrawal · Emma Tosch · Gyeongin Yu · Keno Fischer · Jarrett Revels · Giuseppe Siracusano · Yaoqing Yang · Jeff Johnson · Yang You · Hector Yuen · Chris Ying · Honglei Liu · Nikoli Dryden · Xiangxi Mo · Yangzihao Wang · Amit Juneja · Micah Smith · Qian Yu · pramod gupta · Deepak Narayanan · Keshav Santhanam · Tim Capes · Abdul Dakkak · Norman Mu · Ke Deng · Liam Li · Joao Carreira · Luis Remis · Deepti Raghavan · Una-May O'Reilly · Amanpreet Singh · Mahmoud (Mido) Assran · Eugene Wu · Eytan Bakshy · Jinliang Wei · Michael Innes · Viral Shah · Haibin Lin · Conrad Sanderson · Ryan Curtin · Marcus Edel -
2018 Poster: Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise »
Dan Hendrycks · Mantas Mazeika · Duncan Wilson · Kevin Gimpel -
2016 Poster: Swapout: Learning an ensemble of deep architectures »
Saurabh Singh · Derek Hoiem · David Forsyth -
2013 Poster: Fast Template Evaluation with Vector Quantization »
Mohammad Amin Sadeghi · David Forsyth