Poster
in
Workshop: Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

Jie-Jing Shao · Bowen Zhang · Xiao-Wen Yang · Baizhi Chen · Siyu Han · Wen-Da Wei · Guohao Cai · Zhenhua Dong · Lan-Zhe Guo · Yu-Feng Li

Project Page [ OpenReview]

Abstract

Recent advances in LLMs have spurred the development of \emph{Language Agents} for real-world applications such as travel planning, which involves complex multi-constraint challenges. Existing benchmarks, however, often oversimplify reality with synthetic queries and limited constraints. To bridge this gap, we introduce \emph{ChinaTravel}, the first open-ended benchmark based on authentic travel needs. We develop a domain-specific language (DSL) for compositional evaluation covering feasibility, constraints, and preferences. Experiments show neuro-symbolic agents achieve a 37.0\% constraint satisfaction rate on human queries, a 10× improvement over neural models, demonstrating their potential in complex planning scenarios.

Chat is not available.