firstbacksecondback
22 Results
Workshop
|
How Does LLM Compression Affect Weight Exfiltration Attacks? Davis Brown · Mantas Mazeika |
||
Workshop
|
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs Ruben Härle · Felix Friedrich · Manuel Brack · Björn Deiseroth · Patrick Schramowski · Kristian Kersting |
||
Poster
|
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types Yutao Mou · Shikun Zhang · Wei Ye |
||
Workshop
|
Sun 16:50 |
Contributed Talk 6: Infecting LLM Agents via Generalizable Adversarial Attack Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson |
|
Workshop
|
Infecting LLM Agents via Generalizable Adversarial Attack Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson |
||
Workshop
|
A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning Anjun Hu · Jiyang Guan · Philip Torr · Francesco Pinto |
||
Workshop
|
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy Tong Wu · Shujian Zhang · Kaiqiang Song · Silei Xu · Sanqiang Zhao · Ravi Agrawal · Sathish Indurthi · Chong Xiang · Prateek Mittal · Wenxuan Zhou |
||
Competition
|
Sun 9:00 |
The NeurIPS 2024 LLM Privacy Challenge Qinbin Li · Junyuan Hong · Chulin Xie · Junyi Hou · Yiqun Diao · Zhun Wang · Dan Hendrycks · Zhangyang "Atlas" Wang · Bo Li · Bingsheng He · Dawn Song |
|
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |
||
Workshop
|
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming Anurakt Kumar · Divyanshu Kumar · Jatan Loya · Nitin Aravind Birur · Tanay Baswa · Sahil Agarwal · Prashanth Harshangi |
||
Workshop
|
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding Haneul Yoo · Yongjin Yang · Hwaran Lee |
||
Competition
|
Sun 13:30 |
CLAS 2024: The Competition for LLM and Agent Safety Zhen Xiang · Yi Zeng · Mintong Kang · Chejian Xu · Jiawei Zhang · Zhuowen Yuan · Zhaorun Chen · Chulin Xie · Fengqing Jiang · Minzhou Pan · Francesco Pinto · Junyuan Hong · Ruoxi Jia · Radha Poovendran · Bo Li |