firstbacksecondback
126 Results
Poster
|
Fri 16:30 |
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack Tiansheng Huang · Sihao Hu · Fatih Ilhan · Selim Tekin · Ling Liu |
|
Poster
|
Fri 11:00 |
MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability YANRUI DU · Sendong Zhao · Danyang Zhao · Ming Ma · Yuhan Chen · Liangyu Huo · Qing Yang · Dongliang Xu · Bing Qin |
|
Poster
|
Wed 16:30 |
Uncovering Safety Risks of Large Language Models through Concept Activation Vector Zhihao Xu · Ruixuan HUANG · Changyu Chen · Xiting Wang |
|
Poster
|
Thu 16:30 |
Safety through feedback in Constrained RL Shashank Reddy Chirra · Pradeep Varakantham · Praveen Paruchuri |
|
Poster
|
Fri 16:30 |
One-Shot Safety Alignment for Large Language Models via Optimal Dualization Xinmeng Huang · Shuo Li · Edgar Dobriban · Osbert Bastani · Hamed Hassani · Dongsheng Ding |
|
Affinity Event
|
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset Khaoula Chehbouni · Jonathan Colaço Carr · Yash More · Jackie CK Cheung · Golnoosh Farnadi |
||
Workshop
|
SkewAct: Red Teaming Large Language Models via Activation-Skewed Adversarial Prompt Optimization Hanxi Guo · Siyuan Cheng · Guanhong Tao · Guangyu Shen · Zhuo Zhang · Shengwei An · Kaiyuan Zhang · Xiangyu Zhang |
||
Workshop
|
Position: Addressing Ethical Challenges and Safety Risks in GenAI-Powered Brain-Computer Interfaces Konstantinos Barmpas · Georgios Zoumpourlis · Yannis Panagakis · Dimitrios Adamos · N Laskaris · Stefanos Zafeiriou |
||
Poster
|
Thu 11:00 |
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense Rui Min · Zeyu Qin · Nevin L. Zhang · Li Shen · Minhao Cheng |
|
Workshop
|
Sun 11:05 |
Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue |
|
Poster
|
Thu 16:30 |
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space Leo Schwinn · David Dobre · Sophie Xhonneux · Gauthier Gidel · Stephan Günnemann |
|
Poster
|
Thu 11:00 |
Transcendence: Generative Models Can Outperform The Experts That Train Them Edwin Zhang · Vincent Zhu · Naomi Saphra · Anat Kleiman · Benjamin Edelman · Milind Tambe · Sham Kakade · Eran Malach |