firstbacksecondback
190 Results
Workshop
|
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents Anthony Costarelli · Mat Allen · Roman Hauksson · Grace Sodunke · Suhas Hariharan · Carlson Cheng · Wenjie Li · Joshua Clymer · Arjun Yadav |
||
Poster
|
Thu 16:30 |
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents Edoardo Debenedetti · Jie Zhang · Mislav Balunovic · Luca Beurer-Kellner · Marc Fischer · Florian Tramer |
|
Workshop
|
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Rogerio Bonatti · Dan Zhao · Sara Abdali · Yinheng Li · Yadong Lu · Justin Wagle · Kazuhito Koishida · Arthur Bucker · Lawrence Jang · Dillon Dupont · Zheng Hui |
||
Poster
|
Thu 11:00 |
MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-Object Demand-driven Navigation Hongcheng Wang · Peiqi Liu · Wenzhe Cai · Mingdong Wu · Zhengyu Qian · Hao Dong |
|
Workshop
|
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Rogerio Bonatti · Dan Zhao · Dillon Dupont · Sara Abdali · Yinheng Li · Yadong Lu · Justin Wagle · Kazuhito Koishida · Arthur Bucker · Lawrence Jang · Zheng Hui |
||
Workshop
|
Simulation System Towards Solving Societal-Scale Manipulation Maximilian Puelma Touzel · Sneheel Sarangi · Austin Welch · Gayatri K · Dan Zhao · Zachary Yang · Hao Yu · Tom Gibbs · Ethan Kosak-Hine · Andreea Musulan · Camille Thibault · Reihaneh Rabbany · Jean-François Godbout · Kellin Pelrine |
||
Workshop
|
From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents Nalin Tiwary · Vardhan Dongre · Sanil Chawla · Ashwin Lamani · Dilek Tur |
||
Workshop
|
Do LLM Personas Dream of Bull Markets? Comparing Human and AI Investment Strategies Through the Lens of the Five-Factor Model Harris Borman · Anna Leontjeva · Luiz Pizzato · Max Kun Jiang · Dan Jermyn |
||
Workshop
|
GUI-WORLD: A GUI-oriented Video Dataset for Multimodal LLM-based Agents Dongping Chen · Yue Huang · Siyuan Wu · Jingyu Tang · Huichi Zhou · Qihui Zhang · Zhigang He · Yilin Bai · Gao Chujie · Liuyi Chen · Yiqiang Li · Chenlong Wang · Yue Yu · Tianshuo Zhou · Zhen Li · Yi Gui · Yao Wan · Pan Zhou · Jianfeng Gao · Lichao Sun |
||
Workshop
|
CRAB: Cross-platfrom agent benchmark for multi-modal embodied language model agents Tianqi Xu · Linyao Chen · Dai-Jie Wu · Yanjun Chen · Zecheng Zhang · Xiang Yao · Zhiqiang Xie · Yongchao Chen · Shilong Liu · Bochen Qian · Philip Torr · Bernard Ghanem · Guohao Li |