Workshop
|
Sun 11:15
|
Invited Talk 3: (Been Kim, Senior Staff Research Scientist, Google Deepmind)
Zaina Shaik
|
|
Workshop
|
|
Representation Tuning
Christopher Ackerman
|
|
Workshop
|
Sat 15:45
|
Formal Analysis and Unification of Generalization in Deep Reinforcement Learning
Ezgi Korkmaz
|
|
Affinity Event
|
Tue 14:00
|
Invited Talk 2 by Lama Ahmad (Technical Program Manager, Trustworthy AI at OpenAI): Human and AI Evaluations for Safety and Robustness Testing
Lama Ahmad
|
|
Workshop
|
Sun 17:00
|
Invited Talk 7: Max Kaufmann on Red-teaming AI systems in government
Max Kaufmann
|
|
Affinity Event
|
|
Position Paper: The Urgent Need for Advancements in Machine Unlearning Algorithms to Ensure AI Safety
Yashaswini Viswanath · Vishwanath Hulipalled · Kaustubha Vecham ·
|
|
Workshop
|
|
Plentiful Jailbreaks with String Compositions
Brian Huang
|
|
Workshop
|
|
Plentiful Jailbreaks with String Compositions
Brian Huang
|
|
Workshop
|
|
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko · Nicolas Flammarion
|
|
Workshop
|
Sat 12:00
|
Weak-to-Strong Confidence Prediction
Yukai Yang · Tracy Zhu · Marco Morucci · Tim G. J. Rudner
|
|
Workshop
|
|
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs
Megh Thakkar · Yash More · Quentin Fournier · Matthew Riemer · Pin-Yu Chen · Amal Zouaq · Payel Das · Sarath Chandar
|
|
Workshop
|
|
AIR-Bench 2024: Safety Evaluation Based on Risk Categories from Regulations and Policies
Kevin Klyman
|
|