Affinity Event
|
Tue 14:00
|
Invited Talk 2 by Lama Ahmad (Technical Program Manager, Trustworthy AI at OpenAI): Human and AI Evaluations for Safety and Robustness Testing
Lama Ahmad
|
|
Poster
|
Fri 11:00
|
SETLEXSEM CHALLENGE: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models
Nicholas Dronen · Bardiya Akhbari · Manish Digambar Gawali
|
|
Workshop
|
|
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA
Qianqi Yan · Xuehai He · Xiang Yue · Xin Eric Wang
|
|
Poster
|
Fri 11:00
|
Adaptive Labeling for Efficient Out-of-distribution Model Evaluation
Daksh Mittal · Yuanzhe Ma · Shalmali Joshi · Hongseok Namkoong
|
|
Poster
|
Wed 16:30
|
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
Meriem Boubdir · Edward Kim · Beyza Ermis · Sara Hooker · Marzieh Fadaee
|
|
Poster
|
Wed 16:30
|
Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes
Andrew Bennett · Nathan Kallus · Miruna Oprescu · Wen Sun · Kaiwen Wang
|
|
Workshop
|
|
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar · Divyanshu Kumar · Jatan Loya · Nitin Aravind Birur · Tanay Baswa · Sahil Agarwal · Prashanth Harshangi
|
|
Poster
|
Thu 11:00
|
TARP-VP: Towards Evaluation of Transferred Adversarial Robustness and Privacy on Label Mapping Visual Prompting Models
Zhen Chen · Yi Zhang · Fu Wang · Xingyu Zhao · Xiaowei Huang · Wenjie Ruan
|
|
Workshop
|
|
GenAI Evaluation Maturity Framework (GEMF) to assess and improve GenAI Evaluations
Yilin Zhang · Frank J. Kanayet
|
|
Poster
|
Fri 11:00
|
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
ZAITANG LI · Pin-Yu Chen · Tsung-Yi Ho
|
|
Workshop
|
Sun 12:25
|
When All Options Are Wrong: Evaluating Large Language Model Robustness with Incorrect Multiple-Choice Options
Gracjan Góral · Emilia Wiśnios
|
|
Poster
|
Wed 11:00
|
Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Anisha Pal · Julia Kruk · Mansi Phute · Manognya Bhattaram · Diyi Yang · Duen Horng Chau · Judy Hoffman
|
|