Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

49 Results

<<   <   Page 3 of 5   >   >>
Workshop
Aligning to What? Limits to RLHF Based Alignment
Logan Barnhart · Reza Akbarian Bafghi · Maziar Raissi · Stephen Becker
Poster
Wed 16:30 Uncovering Safety Risks of Large Language Models through Concept Activation Vector
Zhihao Xu · Ruixuan HUANG · Changyu Chen · Xiting Wang
Poster
Wed 16:30 MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu · Zeyang Zhou · Kexin Huang · Liang Dandan · Yixu Wang · Haiquan Zhao · Yuanqi Yao · xingge qiao · Keqing wang · Yujiu Yang · Yan Teng · Yu Qiao · Yingchun Wang
Workshop
Language Models Resist Alignment
Jiaming Ji · Kaile Wang · Tianyi (Alex) Qiu · Boyuan Chen · Changye Li · Hantao Lou · Jiayi Zhou · Juntao Dai · Yaodong Yang
Workshop
Steering Without Side Effects: Improving Post-Deployment Control of Language Models
Asa Cooper Stickland · Aleksandr Lyzhov · Jacob Pfau · Salsabila Mahdi · Samuel Bowman
Workshop
Sandbag Detection through Model Impairment
Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes
Workshop
GPAI Evaluations Standards Taskforce: towards effective AI governance
Patricia Paskov · Lukas Berglund · Everett Smith · Lisa Soder
Workshop
Position: Addressing Ethical Challenges and Safety Risks in GenAI-Powered Brain-Computer Interfaces
Konstantinos Barmpas · Georgios Zoumpourlis · Yannis Panagakis · Dimitrios Adamos · N Laskaris · Stefanos Zafeiriou
Workshop
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue
Workshop
Sun 11:05 Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue
Workshop
SkewAct: Red Teaming Large Language Models via Activation-Skewed Adversarial Prompt Optimization
Hanxi Guo · Siyuan Cheng · Guanhong Tao · Guangyu Shen · Zhuo Zhang · Shengwei An · Kaiyuan Zhang · Xiangyu Zhang
Poster
Wed 11:00 Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
ShengYun Peng · Pin-Yu Chen · Matthew Hull · Duen Horng Chau