Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

24 Results

<<   <   Page 1 of 2   >   >>
Workshop
Linear Probe Penalties Reduce LLM Sycophancy
Henry Papadatos · Rachel Freedman
Workshop
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien
Workshop
Superficial Alignment, Subtle Divergence, and Nudge Sensitivity in LLM Decision-Making
Manuel Cherep · Nikhil Singh · Patricia Maes
Workshop
LLM Alignment Using Soft Prompt Tuning: The Case of Cultural Alignment
Reem Masoud · Martin Ferianc · Philip Treleaven · Miguel Rodrigues
Workshop
LLM Alignment Through Successive Policy Re-weighting (SPR)
Xinnan Zhang · Siliang Zeng · Jiaxiang Li · Kaixiang Lin · Mingyi Hong
Poster
Thu 16:30 ReMoDetect: Reward Models Recognize Aligned LLM's Generations
Hyunseok Lee · Jihoon Tack · Jinwoo Shin
Poster
Thu 16:30 Aligning LLM Agents by Learning Latent Preference from User Edits
Ge Gao · Alexey Taymanov · Eduardo Salinas · Paul Mineiro · Dipendra Misra
Workshop
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
Allison Huang · Carlos Mougan · Yulu Pi
Workshop
Declarative characterizations of direct preference alignment algorithms
Kyle Richardson · Vivek Srikumar · Ashish Sabharwal
Workshop
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Pankayaraj Pathmanathan · Udari Sehwag · Michael-Andrei Panaitescu-Liess · Furong Huang
Workshop
Sat 12:00 A Statistical Approach to Quantifying LLM Human Alignment
Harbin Hong · Liu Leqi · Sebastian Caldas
Poster
Thu 11:00 Transfer Q-star : Principled Decoding for LLM Alignment
Souradip Chakraborty · Soumya Suvra Ghosal · Ming Yin · Dinesh Manocha · Mengdi Wang · Amrit Singh Bedi · Furong Huang