firstbacksecondback
35 Results
Workshop
|
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images Sami Baral · Li Lucy · Ryan Knight · Alice Ng · Luca Soldaini · Neil Heffernan · Kyle Lo |
||
Poster
|
Fri 11:00 |
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models Linyi Li · Shijie Geng · Zhenwen Li · Yibo He · Hao Yu · Ziyue Hua · Guanghan Ning · Siwei Wang · Tao Xie · Hongxia Yang |
|
Poster
|
Wed 16:30 |
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Joao Monteiro · Pierre-André Noël · Étienne Marcotte · Sai Rajeswar Mudumba · Valentina Zantedeschi · David Vazquez · Nicolas Chapados · Chris Pal · Perouz Taslakian |
|
Workshop
|
Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes Hongxuan Liu · Zhiyao Luo · Tingting Zhu |
||
Workshop
|
On the Protocol for Evaluating Uncertainty in Generative Question-Answering Tasks Andrea Santilli · Miao Xiong · Michael Kirchhof · Pau Rodriguez · Federico Danieli · Xavier Suau · Luca Zappella · Sinead Williamson · Adam Golinski |
||
Workshop
|
Sat 15:45 |
Conversational Question-Answering for process task guidance in manufacturing Ramesh Manuvinakurike · Elizabeth Watkins · Celal Savur · Anthony Rhodes · Sovan Biswas · Richard Beckwith · Gesem Mejia · Saurav Sahay · Giuseppe Raffa · Lama Nachman |
|
Workshop
|
Benchmarking table comprehension in the wild Yikang Pan · Yi Zhu · Rand Xie · Yizhi Liu |
||
Workshop
|
A Benchmark for Long-Form Medical Question Answering Pedram Hosseini · Jessica Sin · Bing Ren · Bryceton Thomas · Elnaz Nouri · Ali Farahanchi · Saeed Hassanpour |
||
Workshop
|
TARGET: Benchmarking Table Retrieval for Generative Tasks Xingyu Ji · Aditya Parameswaran · Madelon Hulsebos |
||
Workshop
|
CinePile: A Long Video Question Answering Dataset and Benchmark Ruchit Rawal · Khalid Saifullah · Ronen Basri · David Jacobs · Gowthami Somepalli · Tom Goldstein |
||
Poster
|
Thu 16:30 |
MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations Ruosen Li · Zimu Wang · Son Tran · Lei Xia · Xinya Du |
|
Poster
|
Thu 11:00 |
AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries Irina Saparina · Mirella Lapata |