Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

68 Results

<<   <   Page 6 of 6   >>   >
Workshop
Inducing Human-like Biases in Moral Reasoning Language Models
Austin Meek · Artem Karpov · Seong Cho · Raymond Koopmanschap · Lucy Farnik · Bogdan-Ionut Cirstea
Poster
Wed 16:30 Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu · Haoyu Zhao · Xinran Gu · Dingli Yu · Anirudh Goyal · Sanjeev Arora
Poster
Wed 11:00 Evaluating alignment between humans and neural network representations in image-based learning tasks
Can Demircan · Tankred Saanum · Leonardo Pettini · Marcel Binz · Blazej Baczkowski · Christian Doeller · Mona Garvert · Eric Schulz
Poster
Wed 11:00 Distributional Preference Alignment of LLMs via Optimal Transport
Igor Melnyk · Youssef Mroueh · Brian Belgodere · Mattia Rigotti · Apoorva Nitsure · Mikhail Yurochkin · Kristjan Greenewald · Jiri Navratil · Jarret Ross
Poster
Thu 11:00 BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Jiongxiao Wang · Jiazhao LI · Yiquan Li · Xiangyu Qi · Junjie Hu · Sharon Li · Patrick McDaniel · Muhao Chen · Bo Li · Chaowei Xiao
Poster
Fri 16:30 Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
Tiansheng Huang · Sihao Hu · Fatih Ilhan · Selim Tekin · Ling Liu
Poster
Fri 16:30 Improving Alignment and Robustness with Circuit Breakers
Andy Zou · Long Phan · Justin Wang · Derek Duenas · Maxwell Lin · Maksym Andriushchenko · J. Zico Kolter · Matt Fredrikson · Dan Hendrycks
Poster
Thu 11:00 Aligning Large Language Models with Representation Editing: A Control Perspective
Lingkai Kong · Haorui Wang · Wenhao Mu · Yuanqi Du · Yuchen Zhuang · Yifei Zhou · Yue Song · Rongzhi Zhang · Kai Wang · Chao Zhang