Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

73 Results

<<   <   Page 5 of 7   >   >>
Workshop
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue
Workshop
GPAI Evaluations Standards Taskforce: towards effective AI governance
Patricia Paskov · Lukas Berglund · Everett Smith · Lisa Soder
Workshop
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents
Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y
Poster
Wed 16:30 What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
Samyak Jain · Ekdeep S Lubana · Kemal Oksuz · Tom Joy · Philip Torr · Amartya Sanyal · Puneet Dokania
Competition
Sun 9:00 The NeurIPS 2024 LLM Privacy Challenge
Qinbin Li · Junyuan Hong · Chulin Xie · Junyi Hou · Yiqun Diao · Zhun Wang · Dan Hendrycks · Zhangyang &quot;Atlas&quot; Wang · Bo Li · Bingsheng He · Dawn Song
Workshop
Sun 11:05 Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue
Workshop
Sandbag Detection through Model Impairment
Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes
Competition
Sun 13:30 CLAS 2024: The Competition for LLM and Agent Safety
Zhen Xiang · Yi Zeng · Mintong Kang · Chejian Xu · Jiawei Zhang · Zhuowen Yuan · Zhaorun Chen · Chulin Xie · Fengqing Jiang · Minzhou Pan · Francesco Pinto · Junyuan Hong · Ruoxi Jia · Radha Poovendran · Bo Li
Workshop
A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation
aviral srivastava · Sourav Panda
Workshop
Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers
Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez
Workshop
Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers
Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez
Workshop
Mitigating Downstream Model Risks via Model Provenance
Keyu Wang · Scott Schaffter · Abdullah Norozi Iranzad · Doina Precup · Jonathan Lebensold · Megan Risdal