Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

22 Results

<<   <   Page 1 of 2   >   >>
Workshop
How Does LLM Compression Affect Weight Exfiltration Attacks?
Davis Brown · Mantas Mazeika
Workshop
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs
Ruben Härle · Felix Friedrich · Manuel Brack · Björn Deiseroth · Patrick Schramowski · Kristian Kersting
Poster
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou · Shikun Zhang · Wei Ye
Workshop
Sun 16:50 Contributed Talk 6: Infecting LLM Agents via Generalizable Adversarial Attack
Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson
Workshop
Infecting LLM Agents via Generalizable Adversarial Attack
Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson
Workshop
A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning
Anjun Hu · Jiyang Guan · Philip Torr · Francesco Pinto
Workshop
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Tong Wu · Shujian Zhang · Kaiqiang Song · Silei Xu · Sanqiang Zhao · Ravi Agrawal · Sathish Indurthi · Chong Xiang · Prateek Mittal · Wenxuan Zhou
Competition
Sun 9:00 The NeurIPS 2024 LLM Privacy Challenge
Qinbin Li · Junyuan Hong · Chulin Xie · Junyi Hou · Yiqun Diao · Zhun Wang · Dan Hendrycks · Zhangyang &quot;Atlas&quot; Wang · Bo Li · Bingsheng He · Dawn Song
Workshop
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents
Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y
Workshop
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar · Divyanshu Kumar · Jatan Loya · Nitin Aravind Birur · Tanay Baswa · Sahil Agarwal · Prashanth Harshangi
Workshop
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
Haneul Yoo · Yongjin Yang · Hwaran Lee
Competition
Sun 13:30 CLAS 2024: The Competition for LLM and Agent Safety
Zhen Xiang · Yi Zeng · Mintong Kang · Chejian Xu · Jiawei Zhang · Zhuowen Yuan · Zhaorun Chen · Chulin Xie · Fengqing Jiang · Minzhou Pan · Francesco Pinto · Junyuan Hong · Ruoxi Jia · Radha Poovendran · Bo Li