NeurIPS 2024

Workshop

Sun 11:15

Invited Talk 3: (Been Kim, Senior Staff Research Scientist, Google Deepmind)
Zaina Shaik

Workshop

Representation Tuning
Christopher Ackerman

Workshop

Sat 15:45

Formal Analysis and Unification of Generalization in Deep Reinforcement Learning
Ezgi Korkmaz

Affinity Event

Tue 14:00

Workshop

Sun 17:00

Invited Talk 7: Max Kaufmann on Red-teaming AI systems in government
Max Kaufmann

Affinity Event

Position Paper: The Urgent Need for Advancements in Machine Unlearning Algorithms to Ensure AI Safety
Yashaswini Viswanath · Vishwanath Hulipalled · Kaustubha Vecham ·

Workshop

Plentiful Jailbreaks with String Compositions
Brian Huang

Workshop

Plentiful Jailbreaks with String Compositions
Brian Huang

Workshop

Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko · Nicolas Flammarion

Workshop

Sat 12:00

Weak-to-Strong Confidence Prediction
Yukai Yang · Tracy Zhu · Marco Morucci · Tim G. J. Rudner

Workshop

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs
Megh Thakkar · Yash More · Quentin Fournier · Matthew Riemer · Pin-Yu Chen · Amal Zouaq · Payel Das · Sarath Chandar

Workshop

AIR-Bench 2024: Safety Evaluation Based on Risk Categories from Regulations and Policies
Kevin Klyman

Main Navigation