Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

69 Results

<<   <   Page 6 of 6   >>   >
Poster
Thu 11:00 WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Yufang Hou · Alessandra Pascale · Javier Carnerero-Cano · Tigran Tchrakian · Radu Marinescu · Elizabeth Daly · Inkit Padhi · Prasanna Sattigeri
Poster
Wed 11:00 RedPajama: an Open Dataset for Training Large Language Models
Maurice Weber · Dan Fu · Quentin Anthony · Yonatan Oren · Shane Adams · Anton Alexandrov · Xiaozhong Lyu · Huu Nguyen · Xiaozhe Yao · Virginia Adams · Ben Athiwaratkun · Rahul Chalamala · Kezhen Chen · Max Ryabinin · Tri Dao · Percy Liang · Christopher Ré · Irina Rish · Ce Zhang
Poster
Thu 16:30 NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security
Minghao Shao · Sofija Jancheska · Meet Udeshi · Brendan Dolan-Gavitt · haoran xi · Kimberly Milner · Boyuan Chen · Max Yin · Siddharth Garg · Prashanth Krishnamurthy · Farshad Khorrami · Ramesh Karri · Muhammad Shafique
Poster
Thu 16:30 EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Sunjun Kweon · Jiyoun Kim · Heeyoung Kwak · Dongchul Cha · Hangyul Yoon · Kwang Kim · Jeewon Yang · Seunghyun Won · Edward Choi
Poster
Wed 16:30 SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation
Zeyao Ma · Bohan Zhang · Jing Zhang · Jifan Yu · Xiaokang Zhang · Xiaohan Zhang · Sijia Luo · Xi Wang · Jie Tang
Poster
Wed 11:00 DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model
Yuqi Wang · Ke Cheng · Jiawei He · Qitai Wang · Hengchen Dai · Yuntao Chen · Fei Xia · ZHAO-XIANG ZHANG
Poster
Fri 11:00 INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
Edward Vendrow · Omiros Pantazis · Alexander Shepard · Gabriel Brostow · Kate Jones · Oisin Mac Aodha · Sara Beery · Grant Van Horn
Poster
Thu 16:30 A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics
Puze Liu · Jonas Günster · Niklas Funk · Simon Gröger · Dong Chen · Haitham Bou Ammar · Julius Jankowski · Ante Marić · Sylvain Calinon · Andrej Orsula · Miguel Olivares · Hongyi Zhou · Rudolf Lioutikov · Gerhard Neumann · Amarildo Likmeta · Amirhossein Zhalehmehrabi · Thomas Bonenfant · Marcello Restelli · Davide Tateo · Ziyuan Liu · Jan Peters
Poster
Fri 16:30 Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving
Theodore Tsesmelis · Luca Palmieri · Marina Khoroshiltseva · Adeela Islam · Gur Elkin · Ofir I Shahar · Gianluca Scarpellini · Stefano Fiorini · Yaniv Ohayon · Nadav Alali · Sinem Aslan · Pietro Morerio · Sebastiano Vascon · Elena gravina · Maria Napolitano · Giuseppe Scarpati · Gabriel zuchtriegel · Alexandra Spühler · Michel Fuchs · Stuart James · Ohad Ben-Shahar · Marcello Pelillo · Alessio Del Bue