Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

126 Results

<<   <   Page 3 of 11   >   >>
Poster
Fri 16:30 Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
Tiansheng Huang · Sihao Hu · Fatih Ilhan · Selim Tekin · Ling Liu
Poster
Fri 11:00 MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability
YANRUI DU · Sendong Zhao · Danyang Zhao · Ming Ma · Yuhan Chen · Liangyu Huo · Qing Yang · Dongliang Xu · Bing Qin
Poster
Wed 16:30 Uncovering Safety Risks of Large Language Models through Concept Activation Vector
Zhihao Xu · Ruixuan HUANG · Changyu Chen · Xiting Wang
Poster
Thu 16:30 Safety through feedback in Constrained RL
Shashank Reddy Chirra · Pradeep Varakantham · Praveen Paruchuri
Poster
Fri 16:30 One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Xinmeng Huang · Shuo Li · Edgar Dobriban · Osbert Bastani · Hamed Hassani · Dongsheng Ding
Affinity Event
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni · Jonathan Colaço Carr · Yash More · Jackie CK Cheung · Golnoosh Farnadi
Workshop
SkewAct: Red Teaming Large Language Models via Activation-Skewed Adversarial Prompt Optimization
Hanxi Guo · Siyuan Cheng · Guanhong Tao · Guangyu Shen · Zhuo Zhang · Shengwei An · Kaiyuan Zhang · Xiangyu Zhang
Workshop
Position: Addressing Ethical Challenges and Safety Risks in GenAI-Powered Brain-Computer Interfaces
Konstantinos Barmpas · Georgios Zoumpourlis · Yannis Panagakis · Dimitrios Adamos · N Laskaris · Stefanos Zafeiriou
Poster
Thu 11:00 Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
Rui Min · Zeyu Qin · Nevin L. Zhang · Li Shen · Minhao Cheng
Workshop
Sun 11:05 Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue
Poster
Thu 16:30 Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn · David Dobre · Sophie Xhonneux · Gauthier Gidel · Stephan Günnemann
Poster
Thu 11:00 Transcendence: Generative Models Can Outperform The Experts That Train Them
Edwin Zhang · Vincent Zhu · Naomi Saphra · Anat Kleiman · Benjamin Edelman · Milind Tambe · Sham Kakade · Eran Malach