firstbacksecondback
69 Results
Poster
|
Thu 11:00 |
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia Yufang Hou · Alessandra Pascale · Javier Carnerero-Cano · Tigran Tchrakian · Radu Marinescu · Elizabeth Daly · Inkit Padhi · Prasanna Sattigeri |
|
Poster
|
Wed 11:00 |
RedPajama: an Open Dataset for Training Large Language Models Maurice Weber · Dan Fu · Quentin Anthony · Yonatan Oren · Shane Adams · Anton Alexandrov · Xiaozhong Lyu · Huu Nguyen · Xiaozhe Yao · Virginia Adams · Ben Athiwaratkun · Rahul Chalamala · Kezhen Chen · Max Ryabinin · Tri Dao · Percy Liang · Christopher Ré · Irina Rish · Ce Zhang |
|
Poster
|
Thu 16:30 |
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security Minghao Shao · Sofija Jancheska · Meet Udeshi · Brendan Dolan-Gavitt · haoran xi · Kimberly Milner · Boyuan Chen · Max Yin · Siddharth Garg · Prashanth Krishnamurthy · Farshad Khorrami · Ramesh Karri · Muhammad Shafique |
|
Poster
|
Thu 16:30 |
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries Sunjun Kweon · Jiyoun Kim · Heeyoung Kwak · Dongchul Cha · Hangyul Yoon · Kwang Kim · Jeewon Yang · Seunghyun Won · Edward Choi |
|
Poster
|
Wed 16:30 |
SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation Zeyao Ma · Bohan Zhang · Jing Zhang · Jifan Yu · Xiaokang Zhang · Xiaohan Zhang · Sijia Luo · Xi Wang · Jie Tang |
|
Poster
|
Wed 11:00 |
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model Yuqi Wang · Ke Cheng · Jiawei He · Qitai Wang · Hengchen Dai · Yuntao Chen · Fei Xia · ZHAO-XIANG ZHANG |
|
Poster
|
Fri 11:00 |
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark Edward Vendrow · Omiros Pantazis · Alexander Shepard · Gabriel Brostow · Kate Jones · Oisin Mac Aodha · Sara Beery · Grant Van Horn |
|
Poster
|
Thu 16:30 |
A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics Puze Liu · Jonas Günster · Niklas Funk · Simon Gröger · Dong Chen · Haitham Bou Ammar · Julius Jankowski · Ante Marić · Sylvain Calinon · Andrej Orsula · Miguel Olivares · Hongyi Zhou · Rudolf Lioutikov · Gerhard Neumann · Amarildo Likmeta · Amirhossein Zhalehmehrabi · Thomas Bonenfant · Marcello Restelli · Davide Tateo · Ziyuan Liu · Jan Peters |
|
Poster
|
Fri 16:30 |
Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving Theodore Tsesmelis · Luca Palmieri · Marina Khoroshiltseva · Adeela Islam · Gur Elkin · Ofir I Shahar · Gianluca Scarpellini · Stefano Fiorini · Yaniv Ohayon · Nadav Alali · Sinem Aslan · Pietro Morerio · Sebastiano Vascon · Elena gravina · Maria Napolitano · Giuseppe Scarpati · Gabriel zuchtriegel · Alexandra Spühler · Michel Fuchs · Stuart James · Ohad Ben-Shahar · Marcello Pelillo · Alessio Del Bue |