Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

35 Results

<<   <   Page 1 of 3   >   >>
Workshop
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images
Sami Baral · Li Lucy · Ryan Knight · Alice Ng · Luca Soldaini · Neil Heffernan · Kyle Lo
Poster
Fri 11:00 InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
Linyi Li · Shijie Geng · Zhenwen Li · Yibo He · Hao Yu · Ziyue Hua · Guanghan Ning · Siwei Wang · Tao Xie · Hongxia Yang
Poster
Wed 16:30 RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Joao Monteiro · Pierre-André Noël · Étienne Marcotte · Sai Rajeswar Mudumba · Valentina Zantedeschi · David Vazquez · Nicolas Chapados · Chris Pal · Perouz Taslakian
Workshop
Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes
Hongxuan Liu · Zhiyao Luo · Tingting Zhu
Workshop
On the Protocol for Evaluating Uncertainty in Generative Question-Answering Tasks
Andrea Santilli · Miao Xiong · Michael Kirchhof · Pau Rodriguez · Federico Danieli · Xavier Suau · Luca Zappella · Sinead Williamson · Adam Golinski
Workshop
Sat 15:45 Conversational Question-Answering for process task guidance in manufacturing
Ramesh Manuvinakurike · Elizabeth Watkins · Celal Savur · Anthony Rhodes · Sovan Biswas · Richard Beckwith · Gesem Mejia · Saurav Sahay · Giuseppe Raffa · Lama Nachman
Workshop
Benchmarking table comprehension in the wild
Yikang Pan · Yi Zhu · Rand Xie · Yizhi Liu
Workshop
A Benchmark for Long-Form Medical Question Answering
Pedram Hosseini · Jessica Sin · Bing Ren · Bryceton Thomas · Elnaz Nouri · Ali Farahanchi · Saeed Hassanpour
Workshop
TARGET: Benchmarking Table Retrieval for Generative Tasks
Xingyu Ji · Aditya Parameswaran · Madelon Hulsebos
Workshop
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal · Khalid Saifullah · Ronen Basri · David Jacobs · Gowthami Somepalli · Tom Goldstein
Poster
Thu 16:30 MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations
Ruosen Li · Zimu Wang · Son Tran · Lei Xia · Xinya Du
Poster
Thu 11:00 AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries
Irina Saparina · Mirella Lapata