Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

14 Results

<<   <   Page 1 of 2   >   >>
Affinity Event
Tue 14:00 Invited Talk 2 by Lama Ahmad (Technical Program Manager, Trustworthy AI at OpenAI): Human and AI Evaluations for Safety and Robustness Testing
Lama Ahmad
Poster
Fri 11:00 SETLEXSEM CHALLENGE: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models
Nicholas Dronen · Bardiya Akhbari · Manish Digambar Gawali
Workshop
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA
Qianqi Yan · Xuehai He · Xiang Yue · Xin Eric Wang
Poster
Fri 11:00 Adaptive Labeling for Efficient Out-of-distribution Model Evaluation
Daksh Mittal · Yuanzhe Ma · Shalmali Joshi · Hongseok Namkoong
Poster
Wed 16:30 Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
Meriem Boubdir · Edward Kim · Beyza Ermis · Sara Hooker · Marzieh Fadaee
Poster
Wed 16:30 Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes
Andrew Bennett · Nathan Kallus · Miruna Oprescu · Wen Sun · Kaiwen Wang
Workshop
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar · Divyanshu Kumar · Jatan Loya · Nitin Aravind Birur · Tanay Baswa · Sahil Agarwal · Prashanth Harshangi
Poster
Thu 11:00 TARP-VP: Towards Evaluation of Transferred Adversarial Robustness and Privacy on Label Mapping Visual Prompting Models
Zhen Chen · Yi Zhang · Fu Wang · Xingyu Zhao · Xiaowei Huang · Wenjie Ruan
Workshop
GenAI Evaluation Maturity Framework (GEMF) to assess and improve GenAI Evaluations
Yilin Zhang · Frank J. Kanayet
Poster
Fri 11:00 GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
ZAITANG LI · Pin-Yu Chen · Tsung-Yi Ho
Workshop
Sun 12:25 When All Options Are Wrong: Evaluating Large Language Model Robustness with Incorrect Multiple-Choice Options
Gracjan Góral · Emilia Wiśnios
Poster
Wed 11:00 Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Anisha Pal · Julia Kruk · Mansi Phute · Manognya Bhattaram · Diyi Yang · Duen Horng Chau · Judy Hoffman