Poster
Diversity from the Void: Training Adaptive Agents in Open-Ended Simulators
Robby Costales · Stefanos Nikolaidis
West Ballroom A-D #6400
The application of end-to-end learning methods to embodied decision-making domains is bottlenecked by their reliance on a superabundance of training data representative of the target domain. By abandoning the elusive objective of zero-shot generalization in favor of few-shot adaptation, meta-reinforcement learning (meta-RL) approaches hold promise for bridging larger generalization gaps than their zero-shot counterparts. While learning this meta-level adaptive behavior still requires substantial data, efficient environment simulators approaching real-world complexity are growing in prevalence. Even so, hand-crafting a sufficiently diverse set of simulated training tasks to bridge any significant sim-to-real gap is prohibitively labor-intensive. Domain randomization (DR) and procedural generation (PG), offered as solutions to this problem, require simulators to possess carefully-defined parameters which are assumed to translate directly to meaningful task diversity. In this work, we present DIVA, an evolutionary approach for generating diverse training tasks in absence of well-behaved simulator parameterizations. DIVA strikes a balance between the unscalable flexibility of unsupervised environment design (UED) approaches, and the intensely supervised structure contained in well-defined simulators exploited by DR and PG. Our empirical results demonstrate DIVA's unique ability to leverage ill-parameterized simulators to train adaptive behavior in meta-RL agents, far outperforming competitive baselines. These findings highlight the potential of approaches like DIVA to enable training in complex open-ended domains, and to produce more robust and adaptable agents.
Live content is unavailable. Log in and register to view live content