The PokéAgent Challenge: Competitive and Long-Context Learning at Scale
Abstract
While frontier AI models excel at language understanding, math reasoning, and code generation, they underperform in out-of-distribution generalization, adaptation to strategic opponents, game-theoretic decision-making, and long-context reasoning and planning. To address these gaps, we introduce the PokéAgent Challenge, leveraging Pokémon’s rich multi-agent battle system and expansive role-playing game (RPG) environment. The competition features two complementary tracks: the \textit{Battling Track} evaluates generalization and strategic reasoning under uncertainty in the two-player game of Competitive Pokémon, while the \textit{Speedrunning Track} targets long-horizon planning and decision-making in the Pokémon RPG. Together, our competition tracks unify recent interests in reinforcement learning (RL) and large language model (LLM) research, encouraging collaboration across communities. Pokémon's popularity and internet presence are a key strength of our competition: Participants will have access to a large dataset of over 3.5 million battles and a knowledge base of reference materials and baseline methods. Recent work led by our competition's organizers will provide varied baselines, including rule-based, RL, and LLM-based agents. Our resources will make the PokéAgent challenge accessible while maintaining the complexity needed to drive fundamental advances in decision-making systems.
Schedule
|
|
|
8:30 AM
|
|
9:00 AM
|
|
|
|
|
|
10:30 AM
|