Workshop
Safe Generative AI
Dianbo Liu 路 Ling Pan 路 Tailin Wu 路 Bonaventure F. P. Dossou 路 Emmanuel Bengio 路 Yilun Du 路 Dinghuai Zhang 路 Yoshua Bengio
East Exhibition Hall A
Sun 15 Dec, 9 a.m. PST
In the past two years, generative AI has been the major driving force behind the development of advanced AI productssuch as ChatGPT4, AlphaFold, and StableDiffusion. These technologies, while significantly improving productivity for many, have raised significant safety concerns. However, there has been no workshop focusing on this topic in the past two years. This workshop, emphasizing AI safety concerns related to the use of generative AI, is very needed for the community. Generative AI, including large language models, vision-language models, diffusion models, and many more, has significantly aided various aspects of both academia and industry. In scientific discovery, these aspects encompass experimental design, hypothesis formulation, theoretical reasoning, and observation organization. In commercial applications, generative models such as large language models and diffusion algorithms have changed the lifestyles and workflows of billions around the world. This workshop aims to convene experts from various fields to address these challenges and explore potential solutions.
Schedule
Sun 9:00 a.m. - 9:40 a.m.
|
Opening remarks by Prof. Yoshua Bengio
(
talk
)
>
SlidesLive Video |
馃敆 |
Sun 9:40 a.m. - 10:20 a.m.
|
talk by Prof. Max Tegmark talk
(
talk
)
>
SlidesLive Video |
馃敆 |
Sun 10:20 a.m. - 11:00 a.m.
|
talk by Prof. Chelsea Finn talk
(
talk
)
>
SlidesLive Video |
馃敆 |
Sun 11:00 a.m. - 11:40 a.m.
|
talk by Prof. Dawn Song
(
talk
)
>
SlidesLive Video |
馃敆 |
Sun 11:40 a.m. - 1:30 p.m.
|
Lunch break
|
馃敆 |
Sun 1:30 p.m. - 1:40 p.m.
|
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
SlidesLive Video |
馃敆 |
Sun 1:40 p.m. - 1:50 p.m.
|
On Calibration of LLM-based Guard Models for Reliable Content Moderationpdf icon
SlidesLive Video |
馃敆 |
Sun 1:50 p.m. - 2:00 p.m.
|
Controllable Generation via Locally Constrained Resampling
SlidesLive Video |
馃敆 |
Sun 2:00 p.m. - 2:10 p.m.
|
Who Speaks Matters: Analysing the Influence of the Speaker鈥檚 Ethnicity on Hate Classification
SlidesLive Video |
馃敆 |
Sun 2:10 p.m. - 2:20 p.m.
|
The effect of fine-tuning on language model toxicity
SlidesLive Video |
馃敆 |
Sun 2:20 p.m. - 2:30 p.m.
|
GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding
SlidesLive Video |
馃敆 |
Sun 2:30 p.m. - 2:40 p.m.
|
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
SlidesLive Video |
馃敆 |
Sun 2:40 p.m. - 2:50 p.m.
|
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
SlidesLive Video |
馃敆 |
Sun 2:50 p.m. - 3:00 p.m.
|
Does Refusal Training in LLMs Generalize to the Past Tense?
SlidesLive Video |
馃敆 |
Sun 3:00 p.m. - 5:00 p.m.
|
Poster session
|
馃敆 |
Sun 3:00 p.m. - 5:00 p.m.
|
Poster session
|
馃敆 |
-
|
HSpace Sparse Autoencoders ( Poster ) > link | Ayodeji Ijishakin 路 Ming Ang 路 Levente Baljer 路 Daniel Tan 路 Hugo Fry 路 Ahmed Abdulaal 路 Aengus Lynch 馃敆 |
-
|
Measuring Steerability in Large Language Models ( Poster ) > link | Trenton Chang 路 Jenna Wiens 路 Tobias Schnabel 路 Adith Swaminathan 馃敆 |
-
|
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge ( Poster ) > link |
12 presentersJiayi Ye 路 Yanbo Wang 路 Yue Huang 路 Dongping Chen 路 Qihui Zhang 路 Nuno Moniz 路 Tian Gao 路 Werner Geyer 路 Chao Huang 路 Pin-Yu Chen 路 Nitesh Chawla 路 Xiangliang Zhang |
-
|
Towards Safe and Honest AI Agents with Neural Self-Other Overlap ( Poster ) > link | Marc Carauleanu 路 Michael Vaiana 路 Diogo de Lucena 路 Judd Rosenblatt 路 Cameron Berg 馃敆 |
-
|
Towards Safe and Honest AI Agents with Neural Self-Other Overlap ( Oral ) > link | Marc Carauleanu 路 Michael Vaiana 路 Diogo de Lucena 路 Judd Rosenblatt 路 Cameron Berg 馃敆 |
-
|
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks ( Poster ) > link | Alex Unnervik 路 Hatef Otroshi Shahreza 路 Anjith George 路 S茅bastien Marcel 馃敆 |
-
|
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks ( Oral ) > link | Alex Unnervik 路 Hatef Otroshi Shahreza 路 Anjith George 路 S茅bastien Marcel 馃敆 |
-
|
Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models ( Poster ) > link | Neel Jain 路 Aditya Shrivastava 路 Chenyang Zhu 路 Daben Liu 路 Alfy Samuel 路 Ashwinee Panda 路 Anoop Kumar 路 Micah Goldblum 路 Tom Goldstein 馃敆 |
-
|
GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding ( Poster ) > link | James O' Neill 路 Santhosh Subramanian 路 Eric Lin 路 Abishek Satish 路 Vaikkunth Mugunthan 馃敆 |
-
|
GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding ( Oral ) > link | James O' Neill 路 Santhosh Subramanian 路 Eric Lin 路 Abishek Satish 路 Vaikkunth Mugunthan 馃敆 |
-
|
Hidden in the Noise: Two-Stage Robust Watermarking for Images ( Poster ) > link | Kasra Arabi 路 Benjamin Feuer 路 R. Teal Witter 路 Chinmay Hegde 路 Niv Cohen 馃敆 |
-
|
Auditing Empirical Privacy Protection of Private LLM Adaptations ( Poster ) > link | Bart艂omiej Marek 路 Vincent Hanke 路 Xun Wang 路 Michael Backes 路 Adam Dziedzic 路 Franziska Boenisch 馃敆 |
-
|
Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent ( Poster ) > link | Linfeng He 路 Yiming Sun 路 Sihao Wu 路 Jiaxu Liu 路 Xiaowei Huang 馃敆 |
-
|
Controllable Generation via Locally Constrained Resampling ( Poster ) > link | Kareem Ahmed 路 Kai-Wei Chang 路 Guy Van den Broeck 馃敆 |
-
|
Controllable Generation via Locally Constrained Resampling ( Oral ) > link | Kareem Ahmed 路 Kai-Wei Chang 路 Guy Van den Broeck 馃敆 |
-
|
Retention Score: Quantifying Jailbreak Risks for Vision Language Models ( Poster ) > link | ZAITANG LI 路 Pin-Yu Chen 路 Tsung-Yi Ho 馃敆 |
-
|
The Impact of Inference Acceleration Strategies on Bias of Large Language Models ( Poster ) > link | Elisabeth Kirsten 路 Ivan Habernal 路 Vedant Nanda 路 Muhammad Bilal Zafar 馃敆 |
-
|
AnyPrefer: An Automatic Framework for Preference Data Synthesis ( Poster ) > link |
16 presentersYiyang Zhou 路 Zhaoyang Wang 路 Tianle Wang 路 Shangyu Xing 路 Peng Xia 路 Bo Li 路 Kaiyuan Zheng 路 Zijian Zhang 路 Zhaorun Chen 路 Wenhao Zheng 路 Xuchao Zhang 路 Chetan Bansal 路 Weitong Zhang 路 Ying Wei 路 Mohit Bansal 路 Huaxiu Yao |
-
|
Steering Without Side Effects: Improving Post-Deployment Control of Language Models ( Poster ) > link | Asa Cooper Stickland 路 Aleksandr Lyzhov 路 Jacob Pfau 路 Salsabila Mahdi 路 Samuel Bowman 馃敆 |
-
|
Safe and Sound: Evaluating Language Models for Bias Mitigation and Understanding ( Poster ) > link | Shaina Raza 路 Deval Pandya 路 Shardul ghuge 路 Nifemi 馃敆 |
-
|
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs ( Poster ) > link | Divyanshu Kumar 路 Umang Jain 路 Sahil Agarwal 路 Prashanth Harshangi 馃敆 |
-
|
Self-Preference Bias in LLM-as-a-Judge ( Poster ) > link | Koki Wataoka 路 Tsubasa Takahashi 路 Ryokan Ri 馃敆 |
-
|
Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models ( Poster ) > link | Tiejin Chen 路 Kaishen Wang 路 Hua Wei 馃敆 |
-
|
The Probe Paradigm: A Theoretical Foundation for Explaining Generative Models ( Poster ) > link | Amit Rege 馃敆 |
-
|
LLM Improvement for Jailbreak Defense: Analysis Through the Lens of Over-Refusal ( Poster ) > link | Swetasudha Panda 路 Naveen Jafer Nizar 路 Michael Wick 馃敆 |
-
|
Network Inversion for Training-Like Data Reconstruction ( Poster ) > link | Pirzada Suhail 路 Amit Sethi 馃敆 |
-
|
Lexically-constrained automated prompt augmentation: A case study using adversarial T2I data ( Poster ) > link |
12 presentersJessica Quaye 路 Alicia Parrish 路 Oana Inel 路 Minsuk Kahng 路 Charvi Rastogi 路 Hannah Rose Kirk 路 Jess Tsang 路 Nathan Clement 路 Rafael Mosquera-Gomez 路 Juan Ciro 路 Vijay Janapa Reddi 路 Lora Aroyo |
-
|
Detecting Origin Attribution for Text-to-Image Diffusion Models in RGB and Beyond ( Poster ) > link | Katherine Xu 路 Lingzhi Zhang 路 Jianbo Shi 馃敆 |
-
|
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence ( Poster ) > link | Kundan Krishna 路 Sanjana Ramprasad 路 Prakhar Gupta 路 Byron Wallace 路 Zachary Lipton 路 Jeffrey Bigham 馃敆 |
-
|
The Structural Safety Generalization Problem ( Poster ) > link | Tom Gibbs 路 Julius Broomfield 路 George Ingebretsen 路 Ethan Kosak-Hine 路 Tia Nasir 路 Jason Zhang 路 Reihaneh Iranmanesh 路 Sara Pieri 路 Reihaneh Rabbany 路 Kellin Pelrine 馃敆 |
-
|
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning ( Poster ) > link | Chongyu Fan 路 Jiancheng Liu 路 Licong Lin 路 Jinghan Jia 路 Ruiqi Zhang 路 Song Mei 路 Sijia Liu 馃敆 |
-
|
Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System ( Poster ) > link | Julian Collado 路 Kevin Stangl 馃敆 |
-
|
Debiasing Large Vision-Language Models by Ablating Protected Attribute Representations ( Poster ) > link | neale ratzlaff 路 Matthew Olson 路 Musashi Hinck 路 Shao-Yen Tseng 路 VASUDEV LAL 路 Phillip Howard 馃敆 |
-
|
GRE Score: Generative Risk Evaluation for Large Language Models ( Poster ) > link | ZAITANG LI 路 Mohamed Mouhajir 路 Pin-Yu Chen 路 Tsung-Yi Ho 馃敆 |
-
|
Identifying and Addressing Delusions for Target-Directed Decision Making ( Poster ) > link | Mingde Zhao 路 Tristan Sylvain 路 Doina Precup 路 Yoshua Bengio 馃敆 |
-
|
Cream: Consistency Regularized Self-Rewarding Language Models ( Poster ) > link | Zhaoyang Wang 路 Weilei He 路 Zhiyuan Liang 路 Xuchao Zhang 路 Chetan Bansal 路 Ying Wei 路 Weitong Zhang 路 Huaxiu Yao 馃敆 |
-
|
Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy ( Poster ) > link | Benedict Aaron Tjandra 路 Muhammed Razzak 路 Jannik Kossen 路 Yarin Gal 馃敆 |
-
|
Epistemic Integrity in Large Language Models ( Poster ) > link | Bijean Ghafouri 路 Shahrad Mohammadzadeh 路 James Zhou 路 Pratheeksha Nair 路 Jacob-Junqi Tian 路 Mayank Goel 路 Reihaneh Rabbany 路 Jean-Fran莽ois Godbout 路 Kellin Pelrine 馃敆 |
-
|
An Adversarial Behavior Model for Contextual Ethical Alignment in Large Language Models ( Poster ) > link | Edward Chang 馃敆 |
-
|
Differentially Private Sequential Data Synthesis with Structured State Space Models and Diffusion Models ( Poster ) > link | Tomoya Matsumoto 路 Takayuki Miura 路 Toshiki Shibahara 路 Masanobu Kii 路 Kazuki Iwahana 路 Osamu Saisho 路 Shingo OKAMURA 馃敆 |
-
|
Do LLMs estimate uncertainty well in instruction-following? ( Poster ) > link | Juyeon Heo 路 Miao Xiong 路 Christina Heinze-Deml 路 Jaya Narain 馃敆 |
-
|
Concept Unlearning for Large Language Models ( Poster ) > link | Tomoya Yamashita 路 Takayuki Miura 路 Yuuki Yamanaka 路 Toshiki Shibahara 路 Masanori Yamada 馃敆 |
-
|
Mitigating Hallucinations in LVLMs via Summary-Guided Decoding ( Poster ) > link | Kyungmin Min 路 Minbeom Kim 路 Kang-il Lee 路 Dongryeol Lee 路 Kyomin Jung 馃敆 |
-
|
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere ( Poster ) > link | Hatef Otroshi Shahreza 路 S茅bastien Marcel 馃敆 |
-
|
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs ( Poster ) > link | Xuandong Zhao 路 Lei Li 路 Yu-Xiang Wang 馃敆 |
-
|
Investigating LLM Memorization: Bridging Trojan Detection and Training Data Extraction ( Poster ) > link | Manoj Acharya 路 Xiao Lin 路 Susmit Jha 馃敆 |
-
|
DiffTextPure: Defending Large Language Models with Diffusion Purifiers ( Poster ) > link | Huanran Chen 路 Ziruo Wang 路 Yihan Yang 路 Shuo Zhang 路 Zeming Wei 路 Fusheng Jin 路 Yinpeng Dong 馃敆 |
-
|
Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection ( Poster ) > link | Shantanu Thorat 路 Tianbao Yang 馃敆 |
-
|
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective ( Poster ) > link | Andrew Jesson 路 Nicolas Beltran Velez 路 David Blei 馃敆 |
-
|
On the Protocol for Evaluating Uncertainty in Generative Question-Answering Tasks ( Poster ) > link | Andrea Santilli 路 Miao Xiong 路 Michael Kirchhof 路 Pau Rodriguez 路 Federico Danieli 路 Xavier Suau 路 Luca Zappella 路 Sinead Williamson 路 Adam Golinski 馃敆 |
-
|
Pruning for Robust Concept Erasing in Diffusion Models ( Poster ) > link | Tianyun Yang 路 Ziniu Li 路 Juan Cao 路 Chang Xu 馃敆 |
-
|
Concept Denoising Score Matching for Responsible Text-to-Image Generation ( Poster ) > link | Silpa Vadakkeeveetil Sreelatha 路 Sauradip Nag 路 Serge Belongie 路 Muhammad Awais 路 Anjan Dutta 馃敆 |
-
|
Applying Sparse Autoencoders to Unlearn Knowledge in Language Models ( Poster ) > link | Eoin Farrell 路 Yeu-Tong Lau 路 Arthur Conmy 馃敆 |
-
|
Can Knowledge Editing Really Correct Hallucinations? ( Poster ) > link | Baixiang Huang 路 Canyu Chen 路 Xiongxiao Xu 路 Ali Payani 路 Kai Shu 馃敆 |
-
|
Imitation guided Automated Red Teaming ( Poster ) > link | Desik Rengarajan 路 Sajad Mousavi 路 Ashwin Ramesh Babu 路 Vineet Gundecha 路 Avisek Naug 路 Sahand Ghorbanpour 路 Antonio Guillen-Perez 路 Ricardo Luna Gutierrez 路 Soumyendu Sarkar 馃敆 |
-
|
Improving LLM Group Fairness on Tabular Data via In-Context Learning ( Poster ) > link | Valeriia Cherepanova 路 Chia-Jung Lee 路 Nil-Jana Akpinar 路 Riccardo Fogliato 路 Martin Bertran 路 Michael Kearns 路 James Zou 馃敆 |
-
|
Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review ( Poster ) > link | Sungduk Yu 路 Man Luo 路 Avinash Madasu 路 VASUDEV LAL 路 Phillip Howard 馃敆 |
-
|
Can Editing LLMs Inject Harm? ( Poster ) > link |
15 presentersCanyu Chen 路 Baixiang Huang 路 Zekun Li 路 Zhaorun Chen 路 Shiyang Lai 路 Xiongxiao Xu 路 Jia-Chen Gu 路 Jindong Gu 路 Huaxiu Yao 路 Chaowei Xiao 路 Xifeng Yan 路 William Yang Wang 路 Philip Torr 路 Dawn Song 路 Kai Shu |
-
|
Targeted Unlearning with Single Layer Unlearning Gradient ( Poster ) > link | Zikui Cai 路 Yaoteng Tan 路 M. Salman Asif 馃敆 |
-
|
Stronger Universal and Transfer Attacks by Suppressing Refusals ( Poster ) > link | David Huang 路 Avidan Shah 路 Alexandre Araujo 路 David Wagner 路 Chawin Sitawarin 馃敆 |
-
|
Weak-to-Strong Confidence Prediction ( Poster ) > link | Yukai Yang 路 Tracy Zhu 路 Marco Morucci 路 Tim G. J. Rudner 馃敆 |
-
|
Fair Image Generation from Pre-trained Models by Probabilistic Modeling ( Poster ) > link | Mahdi Ahmadi 路 John Leland 路 Agneet Chatterjee 路 YooJung Choi 馃敆 |
-
|
Differentially Private Attention Computation ( Poster ) > link | Yeqi Gao 路 Zhao Song 路 Xin Yang 路 Yufa Zhou 馃敆 |
-
|
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference ( Poster ) > link | Roman Levin 路 Valeriia Cherepanova 路 Abhimanyu Hans 路 Avi Schwarzschild 路 Tom Goldstein 馃敆 |
-
|
Red Teaming Language-Conditioned Robot Models via Vision Language Models ( Poster ) > link | Sathwik Karnik 路 Zhang-Wei Hong 路 NISHANT ABHANGI 路 Yen-Chen Lin 路 Tsun-Hsuan Johnson Wang 路 Pulkit Agrawal 馃敆 |
-
|
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data ( Poster ) > link | Spencer Whitehead 路 Jacob Phillips 路 Sean Hendryx 馃敆 |
-
|
Privacy Protection in Personalized Diffusion Models via Targeted Cross-Attention Adversarial Attack ( Poster ) > link | Xide Xu 路 Muhammad Atif Butt 路 Sandesh Kamath 路 Bogdan Raducanu 馃敆 |
-
|
DeepInception: Hypnotize Large Language Model to Be Jailbreaker ( Poster ) > link | Xuan Li 路 Zhanke Zhou 路 Jianing Zhu 路 Jiangchao Yao 路 Tongliang Liu 路 Bo Han 馃敆 |
-
|
HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection ( Poster ) > link | Theo King 路 Zekun Wu 路 Adriano Koshiyama 路 Emre Kazim 路 Philip Treleaven 馃敆 |
-
|
Testing the Limits of Jailbreaking with the Purple Problem ( Poster ) > link | Taeyoun Kim 路 Suhas Kotha 路 Aditi Raghunathan 馃敆 |
-
|
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models ( Poster ) > link | Xiaomeng Hu 路 Pin-Yu Chen 路 Tsung-Yi Ho 馃敆 |
-
|
How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? ( Poster ) > link | Saeid Asgari 路 Joseph G Lambourne 路 Alana Mongkhounsavath 馃敆 |
-
|
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs ( Poster ) > link | Giulio Zizzo 路 Giandomenico Cornacchia 路 Kieran Fraser 路 Muhammad Zaid Hameed 路 Ambrish Rawat 路 Beat Buesser 路 Mark Purcell 路 Pin-Yu Chen 路 Prasanna Sattigeri 路 Kush Varshney 馃敆 |
-
|
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs ( Oral ) > link | Giulio Zizzo 路 Giandomenico Cornacchia 路 Kieran Fraser 路 Muhammad Zaid Hameed 路 Ambrish Rawat 路 Beat Buesser 路 Mark Purcell 路 Pin-Yu Chen 路 Prasanna Sattigeri 路 Kush Varshney 馃敆 |
-
|
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models ( Poster ) > link | Michael-Andrei Panaitescu-Liess 路 Pankayaraj Pathmanathan 路 Yigitcan Kaya 路 Zora Che 路 Bang An 路 Sicheng Zhu 路 Aakriti Agrawal 路 Furong Huang 馃敆 |
-
|
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training ( Poster ) > link | Shahrad Mohammadzadeh 路 Juan D. Guerra 路 Marco Bonizzato 路 Reihaneh Rabbany 路 Golnoosh Farnadi 馃敆 |
-
|
Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI ( Poster ) > link |
11 presentersRamneet Kaur 路 Colin Samplawski 路 Adam Cobb 路 Anirban Roy 路 Brian Matejek 路 Manoj Acharya 路 Daniel Elenius 路 Alexander Berenbeim 路 John Pavlik 路 Nathaniel Bastian 路 Susmit Jha |
-
|
Jogging the Memory of Unlearned LLMs Through Targeted Relearning Attacks ( Poster ) > link | Shengyuan Hu 路 Yiwei Fu 路 Steven Wu 路 Virginia Smith 馃敆 |
-
|
A Closer Look at System Message Robustness ( Poster ) > link | Norman Mu 路 Jonathan Lu 路 Michael Lavery 路 David Wagner 馃敆 |
-
|
The effect of fine-tuning on language model toxicity ( Poster ) > link | Will Hawkins 路 Brent Mittelstadt 路 Chris Russell 馃敆 |
-
|
The effect of fine-tuning on language model toxicity ( Oral ) > link | Will Hawkins 路 Brent Mittelstadt 路 Chris Russell 馃敆 |
-
|
Universal Jailbreak Backdoors in Large Language Model Alignment ( Poster ) > link | Thomas Baumann 馃敆 |
-
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents ( Poster ) > link | Samuel Brown 路 Basil Labib 路 Codruta Lugoj 路 Sai Sasank Y 馃敆 |
-
|
Waste not, want not; Recycled Gumbel noise improves consistency in natural language generation ( Poster ) > link | Damien de Mijolla 路 Hannan Saddiq 路 Kim Moore 馃敆 |
-
|
Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Unlearning ( Poster ) > link |
12 presentersZora Che 路 Stephen Casper 路 Anirudh Satheesh 路 Rohit Gandikota 路 Domenic Rosati 路 Stewart Slocum 路 Lev McKinney 路 Zichu Wu 路 Zikui Cai 路 Bilal Chughtai 路 Furong Huang 路 Dylan Hadfield-Menell |
-
|
Large Language Model Benchmarks Do Not Test Reliability ( Poster ) > link | Joshua Vendrow 路 Edward Vendrow 路 Sara Beery 路 Aleksander Madry 馃敆 |
-
|
EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports ( Poster ) > link | Lama Moukheiber 路 Mira Moukheiber 路 Dana Moukheiber 路 Jae-Woo Ju 路 Hyung-Chul Lee 馃敆 |
-
|
On Calibration of LLM-based Guard Models for Reliable Content Moderation ( Poster ) > link | Hongfu Liu 路 Hengguan Huang 路 Hao Wang 路 Xiangming Gu 路 Ye Wang 馃敆 |
-
|
On Calibration of LLM-based Guard Models for Reliable Content Moderation ( Oral ) > link | Hongfu Liu 路 Hengguan Huang 路 Hao Wang 路 Xiangming Gu 路 Ye Wang 馃敆 |
-
|
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks ( Poster ) > link | Yifan Zeng 路 Yiran Wu 路 Xiao Zhang 路 Huazheng Wang 路 Qingyun Wu 馃敆 |
-
|
How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold ( Poster ) > link | Sahil Verma 路 Royi Rassin 路 Arnav Das 路 Gantavya Bhatt 路 Preethi Seshadri 路 Chirag Shah 路 Jeff A Bilmes 路 Hannaneh Hajishirzi 路 Yanai Elazar 馃敆 |
-
|
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents ( Poster ) > link | Simon Lermen 路 Mateusz Dziemian 路 Govind Pimpale 馃敆 |
-
|
Language Models Can Articulate Their Implicit Goals ( Poster ) > link | Jan Betley 路 Xuchan Bao 路 Mart铆n Soto 路 Anna Sztyber-Betley 路 James Chua 路 Owain Evans 馃敆 |
-
|
Energy-Based Conceptual Diffusion Model ( Poster ) > link | Yi Qin 路 Xinyue Xu 路 Hao Wang 路 Xiaomeng Li 馃敆 |
-
|
MultiVerse: Exposing Large Language Model Alignment Problems in Diverse Worlds ( Poster ) > link | Xiaolong Jin 路 Zhuo Zhang 路 Guangyu Shen 路 Hanxi Guo 路 Kaiyuan Zhang 路 Siyuan Cheng 路 Xiangyu Zhang 馃敆 |
-
|
Who Speaks Matters: Analysing the Influence of the Speaker鈥檚 Ethnicity on Hate Classification ( Poster ) > link | Ananya Malik 路 Kartik Sharma 路 Lynnette Hui Xian Ng 路 Shaily Bhatt 馃敆 |
-
|
Who Speaks Matters: Analysing the Influence of the Speaker鈥檚 Ethnicity on Hate Classification ( Oral ) > link | Ananya Malik 路 Kartik Sharma 路 Lynnette Hui Xian Ng 路 Shaily Bhatt 馃敆 |
-
|
HalLoc: Token-level Localization of Hallucinations for Large Vision Language Models ( Poster ) > link | Eunkyu Park 路 Minyeong Kim 路 Gunhee Kim 馃敆 |
-
|
Safety-Aware Fine-Tuning of Large Language Models ( Poster ) > link | Hyeong Kyu Choi 路 Xuefeng Du 路 Sharon Li 馃敆 |
-
|
Buffer Overflow in Mixture of Experts ( Poster ) > link | Jamie Hayes 路 I Shumailov 路 Itay Yona 馃敆 |
-
|
Preserving Safety in Fine-Tuned Large Language Models: A Systematic Evaluation and Mitigation Strategy ( Poster ) > link | Tsung-Huan Yang 路 Ko-Wei Huang 路 Yung-Hui Li 路 Lun-Wei Ku 馃敆 |
-
|
Extracting Unlearned Information from LLMs with Activation Steering ( Poster ) > link | Atakan Seyito臒lu 路 Aleksei Kuvshinov 路 Leo Schwinn 路 Stephan G眉nnemann 馃敆 |
-
|
Privacy-Preserving Large Language Model Inference via GPU-Accelerated Fully Homomorphic Encryption ( Poster ) > link | Leo de Castro 路 Antigoni Polychroniadou 路 Daniel Escudero 馃敆 |
-
|
Datasets for Navigating Sensitive Topics in Peference Data and Recommendations ( Poster ) > link | Amelia Kovacs 路 Jerry Chee 路 Sarah Dean 馃敆 |
-
|
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity ( Poster ) > link | David Williams-King 路 Linh Le 路 Adam Oberman 路 Yoshua Bengio 馃敆 |
-
|
Efficient and Effective Uncertainty Quantification for LLMs ( Poster ) > link | Miao Xiong 路 Andrea Santilli 路 Michael Kirchhof 路 Adam Golinski 路 Sinead Williamson 馃敆 |
-
|
EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? ( Poster ) > link | Aakriti Agrawal 路 Mucong Ding 路 Zora Che 路 Chenghao Deng 路 Anirudh Satheesh 路 John Langford 路 Furong Huang 馃敆 |
-
|
MED: Exploring LLM Memorization on Encrypted Data ( Poster ) > link | Panagiotis christodoulou 路 Giulio Zizzo 路 Sergio Maffeis 馃敆 |
-
|
An Examination of AI-Generated Text Detectors Across Multiple Domains and Models ( Poster ) > link | Brian Tufts 路 Xuandong Zhao 路 Lei Li 馃敆 |
-
|
Towards Resource Efficient and Interpretable Bias Mitigation in Natural Language Generation ( Poster ) > link | Schrasing Tong 路 Eliott Zemour 路 Rawisara Lohanimit 路 Lalana Kagal 馃敆 |
-
|
NMT-Obfuscator Attack: Ignore a sentence in translation with only one word ( Poster ) > link | Sahar Sadrizadeh 路 C茅sar Descalzo 路 Ljiljana Dolamic 路 Pascal Frossard 馃敆 |
-
|
A Probabilistic Generative Method for Safe Physical System Control Problems ( Poster ) > link |
11 presentersPeiyan Hu 路 Xiaowei Qian 路 Wenhao Deng 路 Rui Wang 路 Haodong Feng 路 Ruiqi Feng 路 Tao Zhang 路 Long Wei 路 Yue Wang 路 Zhi-Ming Ma 路 Tailin Wu |
-
|
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models ( Poster ) > link | Peng Xia 路 Kangyu Zhu 路 Haoran Li 路 Tianze Wang 路 Weijia Shi 路 Sheng Wang 路 Linjun Zhang 路 James Zou 路 Huaxiu Yao 馃敆 |
-
|
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation ( Poster ) > link | Shufan Li 路 Harkanwar Singh 路 Aditya Grover 馃敆 |
-
|
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack ( Poster ) > link | Leo McKee-Reid 路 Christoph Str盲ter 路 Maria Martinez 路 Joe Needham 路 Mikita Balesni 馃敆 |
-
|
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models ( Poster ) > link | Carter Teplica 路 Yixin Liu 路 Arman Cohan 路 Tim G. J. Rudner 馃敆 |
-
|
CoS: Enhancing Personalization and Mitigating Bias with Context Steering ( Poster ) > link | Sashrika Pandey 路 Jerry He 路 Mariah Schrum 路 Anca Dragan 馃敆 |
-
|
What You See Is What You Get: Entity-Aware Summarization for Reliable Sponsored Search ( Poster ) > link |
12 presentersXiao Liang 路 Xinyu Hu 路 Simiao Zuo 路 Jimi He 路 Yu Wang 路 Victor Dong 路 Yeyun Gong 路 Kushal Dave 路 Yi Liu 路 Qiang Lou 路 Shao-Lun Huang 路 Jian Jiao |
-
|
How new data pollutes LLM knowledge and how to dilute it ( Poster ) > link | Chen Sun 路 Renat Aksitov 路 Andrey Zhmoginov 路 Nolan Miller 路 Max Vladymyrov 路 Ulrich Rueckert 路 Been Kim 路 Mark Sandler 馃敆 |
-
|
Mix Data or Merge Models? Optimizing for Performance and Safety in Multilingual Contexts ( Poster ) > link | Aakanksha 路 Arash Ahmadian 路 Seraphina Goldfarb-Tarrant 路 Beyza Ermis 路 Marzieh Fadaee 路 Sara Hooker 馃敆 |
-
|
Simulation System Towards Solving Societal-Scale Manipulation ( Poster ) > link |
15 presentersMaximilian Puelma Touzel 路 Sneheel Sarangi 路 Austin Welch 路 Gayatri K 路 Dan Zhao 路 Zachary Yang 路 Hao Yu 路 Tom Gibbs 路 Ethan Kosak-Hine 路 Andreea Musulan 路 Camille Thibault 路 Busra Gurbuz 路 Reihaneh Rabbany 路 Jean-Fran莽ois Godbout 路 Kellin Pelrine |
-
|
Red Teaming: Everything Everywhere All at Once ( Poster ) > link | Alexandra Chouldechova 路 A. Feder Cooper 路 Abhinav Palia 路 Dan Vann 路 Chad Atalla 路 Hannah Washington 路 Emily Sheng 路 Hanna Wallach 馃敆 |
-
|
Inference, Fast and Slow: Reinterpreting VAEs for OOD Detection ( Poster ) > link | Sicong (Sheldon) Huang 路 Jiawei He 路 Kry Yik Chau Lui 馃敆 |
-
|
The Empirical Impact of Data Sanitization on Language Models ( Poster ) > link | Anwesan Pal 路 Radhika Bhargava 路 Kyle Hinsz 路 Jacques Esterhuizen 路 Sudipta Bhattacharya 馃敆 |
-
|
HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment ( Poster ) > link | Yannis Belkhiter 路 Giulio Zizzo 路 Sergio Maffeis 馃敆 |
-
|
IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization ( Poster ) > link | Ahmed Frikha 路 Nassim Walha 路 Krishna Nakka 路 Ricardo Mendes 路 Xue Jiang 路 Xuebing Zhou 馃敆 |
-
|
Quantifying Likeness: A Simple Machine Learning Approach to Identifying Copyright Infringement in (AI-Generated) Artwork ( Poster ) > link | Michaela Drouillard 路 Ryan Spencer 路 Nik茅e Nantambu-Allen 路 Tegan Maharaj 馃敆 |
-
|
An Undetectable Watermark for Generative Image Models ( Poster ) > link | Sam Gunn 路 Xuandong Zhao 路 Dawn Song 馃敆 |
-
|
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation ( Poster ) > link | Kaiqu Liang 路 Haimin Hu 路 Ryan Liu 路 Tom Griffiths 路 Jaime Fisac 馃敆 |
-
|
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates ( Poster ) > link | Xiaosen Zheng 路 Tianyu Pang 路 Chao Du 路 Qian Liu 路 Jing Jiang 路 Min Lin 馃敆 |
-
|
Semantic Membership Inference Attack against Large Language Models ( Poster ) > link | Hamid Mozaffari 路 Virendra Marathe 馃敆 |
-
|
Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models ( Poster ) > link | Wenda Li 路 Huijie Zhang 路 Qing Qu 馃敆 |
-
|
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries ( Poster ) > link | Adam Yang 路 CHEN CHEN 路 Konstantinos Pitas 馃敆 |
-
|
Differential Privacy of Cross-Attention with Provable Guarantee ( Poster ) > link | Yingyu Liang 路 Zhenmei Shi 路 Zhao Song 路 Yufa Zhou 馃敆 |
-
|
Smoothed Embeddings for Robust Language Models ( Poster ) > link | Ryo Hase 路 Rafi Rashid 路 Ashley Lewis 路 Jing Liu 路 Toshiaki Koike-Akino 路 Kieran Parsons 路 Ye Wang 馃敆 |
-
|
What do we learn from inverting CLIP models? ( Poster ) > link | Hamid Kazemi 路 Atoosa Chegini 路 Jonas Geiping 路 Soheil Feizi 路 Tom Goldstein 馃敆 |
-
|
Does Refusal Training in LLMs Generalize to the Past Tense? ( Poster ) > link | Maksym Andriushchenko 路 Nicolas Flammarion 馃敆 |
-
|
Does Refusal Training in LLMs Generalize to the Past Tense? ( Oral ) > link | Maksym Andriushchenko 路 Nicolas Flammarion 馃敆 |
-
|
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users ( Poster ) > link | Elinor Poole-Dayan 路 Deb Roy 路 Jad Kabbara 馃敆 |
-
|
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment ( Poster ) > link | Pankayaraj Pathmanathan 路 Udari Sehwag 路 Michael-Andrei Panaitescu-Liess 路 Furong Huang 馃敆 |
-
|
Safe Decision Transformer with Learning-based Constraints ( Poster ) > link | Ruhan Wang 路 Dongruo Zhou 馃敆 |
-
|
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning ( Poster ) > link | Jiali Cheng 路 Hadi Amiri 馃敆 |
-
|
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy ( Poster ) > link | Tong Wu 路 Shujian Zhang 路 Kaiqiang Song 路 Silei Xu 路 Sanqiang Zhao 路 Ravi Agrawal 路 Sathish Indurthi 路 Chong Xiang 路 Prateek Mittal 路 Wenxuan Zhou 馃敆 |
-
|
INVESTIGATING ANNOTATOR BIAS IN LARGE LANGUAGE MODELS FOR HATE SPEECH DETECTION ( Poster ) > link |
15 presentersAmit Das 路 Zheng Zhang 路 Md. Najib Hasan 路 Souvika Sarkar 路 Fatemeh Jamshidi 路 Tathagata Bhattacharya 路 Mostafa Rahgouy 路 Nilanjana Raychawdhary 路 Dongji Feng 路 Vinija Jain 路 Aman Chadha 路 Mary Sandage 路 Lauramarie Pope 路 Gerry Dozier 路 Cheryl Seals |
-
|
Towards Inference-time Category-wise Safety Steering for Large Language Models ( Poster ) > link | Amrita Bhattacharjee 路 Shaona Ghosh 路 Traian Rebedea 路 Christopher Parisien 馃敆 |
-
|
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs ( Poster ) > link | Yohan Mathew 路 Ollie Matthews 路 Robert McCarthy 路 Joan Velja 路 Christian Schroeder de Witt 路 Dylan Cope 路 Nandi Schoots 馃敆 |
-
|
Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts ( Poster ) > link | Emily Zhixuan Zeng 路 Yuhao Chen 路 Alexander Wong 馃敆 |
-
|
Designing Physical-World Universal Attacks on Vision Transformers ( Poster ) > link | Mingzhen Shao 馃敆 |
-
|
Rethinking Adversarial Attacks as Protection Against Diffusion-based Mimicry ( Poster ) > link | Haotian Xue 路 Yongxin Chen 馃敆 |
-
|
INTERPRETABILITY OF LLM DECEPTION: UNIVERSAL MOTIF ( Poster ) > link | Wannan Yang 路 Chen Sun 路 Gyorgy Buzsaki 馃敆 |
-
|
Towards a Theory of AI Personhood ( Poster ) > link | Francis Ward 馃敆 |
-
|
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning ( Poster ) > link | Somnath Basu Roy Chowdhury 路 Krzysztof M Choromanski 路 Arijit Sehanobish 路 Kumar Avinava Dubey 路 Snigdha Chaturvedi 馃敆 |
-
|
Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit ( Poster ) > link | Joshua Freeman 路 Chloe Rippe 路 Edoardo Debenedetti 路 Maksym Andriushchenko 馃敆 |
-
|
Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models ( Poster ) > link | Salma Abdel Magid 路 Weiwei Pan 路 Simon Warchol 路 Grace Guo 路 Junsik Kim 路 Wanhua Li 路 Mahia Rahman 路 Hanspeter Pfister 馃敆 |
-
|
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompt ( Poster ) > link | Yusu Qian 路 Haotian Zhang 路 Yinfei Yang 路 Zhe Gan 馃敆 |
-
|
Variational Diffusion Unlearning: a variational inference framework for unlearning in diffusion models ( Poster ) > link | Subhodip Panda 路 M S Varun 路 Shreyans Jain 路 Sarthak Kumar Maharana 路 Prathosh AP 馃敆 |
-
|
Memorization Detection Benchmark for Generative Image models ( Poster ) > link | Marc Molina 路 Felice Burn 馃敆 |
-
|
Dynamic Negative Guidance of Diffusion Models: Towards Immediate Content Removal ( Poster ) > link | Felix Koulischer 路 Johannes Deleu 路 Gabriel Raya 路 Thomas Demeester 路 Luca Ambrogioni 馃敆 |
-
|
Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects ( Poster ) > link | Abdurrahman Zeybey 路 Mehmet Ergezer 路 Tommy Nguyen 馃敆 |
-
|
Choose Your Anchor Wisely: Effective Unlearning Diffusion Models via Concept Reconditioning ( Poster ) > link | Jingyu Zhu 路 Ruiqi Zhang 路 Licong Lin 路 Song Mei 馃敆 |
-
|
Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups ( Poster ) > link |
13 presentersCharvi Rastogi 路 Tian Huey Teh 路 Pushkar Mishra 路 Roma Patel 路 Zoe Ashwood 路 Aida Mostafazadeh Davani 路 Mark D铆az 路 Michela Paganini 路 Alicia Parrish 路 Ding Wang 路 Vinodkumar Prabhakaran 路 Lora Aroyo 路 Verena Rieser |
-
|
Rule-Guided Language Model Alignment for Text Generation Management in Industrial Use Cases ( Poster ) > link | Shunichi Akatsuka 路 Aman Kumar 路 Xian Yeow Lee 路 Lasitha Vidyaratne 路 Dipanjan Ghosh 路 Ahmed Farahat 馃敆 |
-
|
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates ( Poster ) > link | Fengqing Jiang 路 Zhangchen Xu 路 Luyao Niu 路 Bill Yuchen Lin 路 Radha Poovendran 馃敆 |
-
|
Efficiently Identifying Watermarked Segments in Mixed-Source Texts ( Poster ) > link | Xuandong Zhao 路 Chenwen Liao 路 Yu-Xiang Wang 路 Lei Li 馃敆 |
-
|
miniCodeProps: a Minimal Benchmark for Proving Code Properties ( Poster ) > link | Evan Lohn 路 Sean Welleck 馃敆 |
-
|
Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting ( Poster ) > link | Fuqiang Liu 路 Sicong Jiang 路 Luis Miranda-Moreno 路 Seongjin Choi 路 Lijun Sun 馃敆 |
-
|
SolidMark: Evaluating Image Memorization in Generative Models ( Poster ) > link | Nicky Kriplani 路 Minh Pham 路 Gowthami Somepalli 路 Chinmay Hegde 路 Niv Cohen 馃敆 |
-
|
Self-Supervised Bisimulation Action Chunk Representation for Efficient RL ( Poster ) > link | Lei Shi 路 Jianye Hao 路 Hongyao Tang 路 Zibin Dong 路 YAN ZHENG 馃敆 |
-
|
Anchored Optimization and Contrastive Revisions: Addressing Reward Hacking in Alignment ( Poster ) > link | Karel Doosterlinck 路 Winnie Xu 路 Chris Develder 路 Thomas Demeester 路 Amanpreet Singh 路 Christopher Potts 路 Douwe Kiela 路 Shikib Mehri 馃敆 |
-
|
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Investigation of VLM Failure ( Poster ) > link | Lukas Klein 路 Kenza Amara 路 Carsten L眉th 路 Hendrik Strobelt 路 Mennatallah El-Assady 路 Paul Jaeger 馃敆 |
-
|
Can LLMs Verify Arabic Claims? Evaluating the Arabic Fact-Checking Abilities of Multilingual LLMs ( Poster ) > link | Aryan Singhal 路 Ayushman Gupta 路 Ryan L Li 路 Evan Duan 路 Thomas Law 路 Veekshith Rao 馃敆 |
-
|
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs ( Poster ) > link | Saeid Asgari 路 Aliasghar Khani 路 Amir Khasahmadi 馃敆 |
-
|
Unlearning in- vs. out-of-distribution data in LLMs under gradient-based methods ( Poster ) > link | Teodora Baluta 路 Gintare Karolina Dziugaite 路 Pascal Lamblin 路 Fabian Pedregosa 路 Danny Tarlow 馃敆 |
-
|
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage ( Poster ) > link | Rui Xin 路 Niloofar Mireshghallah 路 Stella Li 路 Michael Duan 路 Hyunwoo Kim 路 Yejin Choi 路 Yulia Tsvetkov 路 Sewoong Oh 路 Pang Wei Koh 馃敆 |
-
|
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity ( Poster ) > link | Rheeya Uppaal 路 Apratim Dey 路 Yiting He 路 Yiqiao Zhong 路 Junjie Hu 馃敆 |
-
|
LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning ( Poster ) > link | Xiang Li 路 Qianli Shen 路 Haonan Wang 路 Kenji Kawaguchi 馃敆 |
-
|
CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion ( Poster ) > link | Joshua Kazdan 路 Hao Sun 路 Jiaqi Han 路 Felix Petersen 路 Frederick Vu 路 Stefano Ermon 馃敆 |
-
|
Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance ( Poster ) > link | Linxi Zhao 路 Yihe Deng 路 Weitong Zhang 路 Quanquan Gu 馃敆 |
-
|
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts? ( Poster ) > link | Sravanti Addepalli 路 Yerram Varun 路 Arun Suggala 路 Karthikeyan Shanmugam 路 Prateek Jain 馃敆 |
-
|
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails ( Poster ) > link | Shaona Ghosh 路 Prasoon Varshney 路 Makesh Narsimhan Sreedhar 路 Aishwarya Padmakumar 路 Traian Rebedea 路 Jibin Varghese 路 Christopher Parisien 馃敆 |