Contributed Talks (Oral Papers)
Abstract
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities (Wendong Bu, Yang Wu, Qifan Yu, Minghe Gao, Bingchen Miao, Zhenkui Zhang, Kaihang Pan, Yunfei Li, Mengze Li, Wei Ji, Juncheng Li, Siliang Tang, Yueting Zhuang) https://openreview.net/forum?id=ZI5UuM3CPo
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs (Pengrui Han, Rafal Dariusz Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez) https://openreview.net/forum?id=IC6kVa5eci
Modeling Others' Minds as Code (Kunal Jha, Aydan Yuenan Huang, Eric Ye, Natasha Jaques, Max Kleiman-Weiner) https://openreview.net/forum?id=YxmxGP89eX
Adapting Vision-Language Models for Evaluating World Models (Mariya Hendriksen, Tabish Rashid, David Bignell, Raluca Stevenson, Abdelhak Lemkhenter, Katja Hofmann, Sam Devlin, Sarah Parisot) https://openreview.net/forum?id=h4lBkLlYHg
Spatial Mental Modeling from Limited Views (Baiqiao Yin, Qineng Wang, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Manling Li, Jiajun Wu, Li Fei-Fei) https://openreview.net/forum?id=HBllcu0rmi
Assessing Adaptive World Models in Machines with Novel Games (Lance Ying, Katherine M. Collins, Prafull Sharma, Cédric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J. Gershman, Jacob Andreas, Thomas L. Griffiths, Francois Chollet, Kelsey R Allen, Joshua B. Tenenbaum) https://openreview.net/forum?id=2EdUKZaq6z