Workshop
Intrinsically Motivated Open-ended Learning (IMOL) Workshop
Cédric Colas · Laetitia Teodorescu · Nadia Ady · Cansu Sancaktar · Junyi Chu
Room 260 - 262
How do humans develop broad and flexible repertoires of knowledge and skills? How can we design autonomous lifelong learning machines with the same abilities? The field of IMOL explores these questions through integrating research on the motivational forces, learning architectures, and developmental and environmental constraints supporting the acquisition of open-ended repertoires of skill and knowledge.
At this full-day in-person NeurIPS workshop, we will gather speakers from a wide diversity of scientific traditions, showcase on-going research via contributed talks and poster sessions, and provide networking opportunities for research and mentorship discussions.
Schedule
Sat 6:15 a.m. - 6:25 a.m.
|
Opening Remarks
(
Introduction
)
>
SlidesLive Video |
Cédric Colas 🔗 |
Sat 6:25 a.m. - 7:05 a.m.
|
Georg Martius - Intrinsic Motivations for Efficient Exploration in Reinforcement Learning
(
Invited Talk
)
>
SlidesLive Video I will summarize research in the area of intrinsic motivation in the context of learning and exploration and touch upon open-ended learning in the IMOL community. I will then present our recent work on combining different intrinsic motivation signals with reinforcement learning, such as learning progress, causal influence and information gain. A particular exciting direction is to employ model-based reinforcement learning to make robots learn by freely playing how to interact effectively driven by information gain and other generic drives. We find that this leads to high zero-shot generalization to new tasks. |
🔗 |
Sat 7:05 a.m. - 7:45 a.m.
|
Doina Precup - Towards a General Blueprint for Continual Reinforcement Learning
(
Invited Talk
)
>
SlidesLive Video Intelligent agents must be able to learn by interacting with their environment and to adapt to changes. Continual reinforcement learning provides a natural way to model this process. In this talk, I will discuss for tackling this problem by constructing abstractions, such as intents, options, affordances and partial models that allow an agent to generalize its knowledge quickly to new circumstances. |
🔗 |
Sat 7:45 a.m. - 8:00 a.m.
|
Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 8:00 a.m. - 9:00 a.m.
|
Break and Posters
(
Break and Posters
)
>
|
🔗 |
Sat 9:00 a.m. - 9:15 a.m.
|
What can AI Learn from Human Exploration?
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 9:15 a.m. - 9:55 a.m.
|
Michael Tomasello - Agency and Cognitive Development
(
Invited Talk
)
>
SlidesLive Video Modern theories explain children’s cognitive development mainly in term of Bayesian learning (with some innate priors in infancy). But learning cannot be the whole story or else children could learn anything at any age - which they cannot. They cannot because their capacities to experience and cognitively represent the world are structured by the human species’ evolved psychological architecture - inherited from ancient animal ancestors - and this architecture changes in significant ways over the first years of life. The main organizing principle is agency, including shared agency. The developmental proposal is that young infants (below 9 months) are goal-directed agents who cognitively represent and learn about actualities; toddlers are intentional agents who executively represent and learn also about causal, intentional, and logical possibilities; and preschoolers (over 3 years) are metacognitive agents who metacognitively represent and learn also about normative necessities. This agency-based model of cognitive development recognizes the important role of learning, but at the same time places it in the context of the overall agentive organization of children at particular developmental periods. |
🔗 |
Sat 9:55 a.m. - 11:30 a.m.
|
Lunch & Mentoring Session
(
Lunch & Mentoring Session
)
>
|
🔗 |
Sat 11:30 a.m. - 11:45 a.m.
|
Voyager: An Open-Ended Embodied Agent with Large Language Models
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 11:45 a.m. - 12:25 p.m.
|
Yannick Schroecker - Human-Timescale Adaptation in an Open-Ended Task Space
(
Invited Talk
)
>
SlidesLive Video Foundation models when trained at scale have shown impressive capabilities to adapt to new tasks with few examples provided in context; however, there remains a gap between the ability of these models and requirements to successfully act in embodied domains. To close this gap with reinforcement learning, our agents have to be trained at scale as well. In this talk, I will present recipes towards this end and dive into the details of how we trained AdA, utilizing a vast open ended task space, to achieve human-timescale adaptation in a 3d embodied domain. The trained agent displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. |
🔗 |
Sat 12:25 p.m. - 1:05 p.m.
|
Daniel Polani - Information and its Flow: From Dynamics to Agency and Back
(
Invited Talk
)
>
SlidesLive Video In the last few years, various forms of information flow were found to be useful quantities for the characterization of the decision-making of agents, whether natural or artificial. We here especially consider one particular type of information flow, empowerment, which can be used as intrinsic motivation that is derived from the dynamical properties of the external perception-action loop. The present talk will discuss empowerment in the context of its evolutionary motivation, questions of agency as well as some insightful new links to Dynamical Systems theory. |
🔗 |
Sat 1:05 p.m. - 2:05 p.m.
|
Break and Posters
(
Break and Posters
)
>
|
🔗 |
Sat 2:05 p.m. - 2:45 p.m.
|
Dani Bassett - Agents of Curiosity: Testing Network Theories in Human and Non-Human Inquirers
(
Invited Talk
)
>
SlidesLive Video What is curiosity? Across disciplines, some scholars offer a range of definitions while others eschew definitions altogether. Is the field of curiosity studies simply too young? Should we, as has been argued in neuroscience, press forward in definition-less characterization? At this juncture in the field's history, we turn to an examination of curiosity styles, and ask: How has curiosity been practiced over the last two millennia and how is it practiced today? We exercise a recent historico-philosophical account to catalogue common styles of curiosity and test for their presence as humans browse Wikipedia. Next we consider leading theories from psychology and mechanics that could explain curiosity styles, and formalize those theories in the mathematics of network science. Such a formalization allows theories of curiosity to be explicitly tested in human behavioral data and for their relative mental affordances to be investigated. Moreover, the formalization allows us to train artificial agents to build in human-like curiosity styles through reinforcement learning. Finally, with styles and theories in hand, we expand out to a study of several million users of Wikipedia to understand how curiosity styles might or might not differ around the world and track factors of social inequality. Collectively, our findings support the notion that curiosity is practiced---differently across people---as unique styles of network building, thereby providing a connective counterpoint to the common acquisitional account of curiosity in humans. |
🔗 |
Sat 2:45 p.m. - 3:25 p.m.
|
Natalia Vélez - Studying Large-Scale Collaborations in Open-Ended Games
(
Invited Talk
)
>
SlidesLive Video Humans have developed technological repertoires that have enabled us to survive in virtually every habitat on Earth. However, it can be difficult to trace how these technologies came to be—folk histories of technological achievement often highlight a few brilliant individuals, while losing sight of the rest of the community’s contributions. In this talk, I will present work analyzing player behavior in One Hour One Life, a multiplayer online game where players can build technologically complex communities over many generations (N = 22,011 players, 2,700 communities, 428,255 lives lived, 127,768,267 social interactions detected). This dataset provides a unique opportunity to test how community dynamics shape technological development in an open-ended world: Players can form communities that endure for many generations, and they can combine thousands of unique materials to build vast technological repertoires. At a macroscopic level, we find that community characteristics—such as population size, interconnectedness, and specialization—predict the size and stability of a community’s technological repertoire. Zooming in, we find that individual players contribute their own, individual expertise to technological development—participants consistently perform similar jobs in different communities that they’re placed in, and acquire expertise in these jobs through social interaction. Our work tests theories of cultural evolution and economic complexity at scale and provides a methodological basis to study the interplay between individual expertise and community structures. |
🔗 |
Sat 3:25 p.m. - 3:30 p.m.
|
Closing Remarks
(
Closing Remarks
)
>
SlidesLive Video |
🔗 |
-
|
Reinforcement Learning of Diverse Skills using Mixture of Deep Experts
(
Poster
)
>
link
Agents that can acquire diverse skills to solve the same task have a benefit over other agents if e.g. unexpected environmental changes occur. However, Reinforcement Learning (RL) policies mainly rely on Gaussian parameterization, preventing them from learning multi-modal, diverse skills. In this work, we propose a novel RL approach for training policies that exhibit diverse behavior. To this end, we propose a highly non-linear Mixture of Experts (MoE) as the policy representation, where each expert formalizes a skill as a contextual motion primitive. The context defines the task, which can be for instance the goal reaching position of the agent, or changing physical parameters like friction. Given a context, our trained policy first selects an expert out of the repertoire of skills and subsequently adapts the parameters of the contextual motion primitive. To incentivize our policy to learn diverse skills, we leverage a maximum entropy objective combined with a per-expert context distribution that we optimize alongside each expert. The per-expert context distribution allows each expert to focus on a context sub-space and boost learning speed. However, these distributions need to be able to represent multi-modality and hard discontinuities in the environment's context probability space. We solve these requirements by leveraging energy-based models to represent the per-expert context distributions and show how we can efficiently train them using the standard policy gradient objective. |
Onur Celik · Aleksandar Taranovic · Gerhard Neumann 🔗 |
-
|
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
(
Poster
)
>
link
We present XLand-Minigrid, a suite of tools and grid-world environments for meta-reinforcement learning research inspired by the diversity and depth of XLand and the simplicity and minimalism of MiniGrid. XLand-Minigrid is written in JAX, designed to be highly scalable, and can potentially run on GPU or TPU accelerators, democratizing large-scale experimentation with limited resources. To demonstrate the generality of our library, we have implemented some well-known single-task environments as well as new meta-learning environments capable of generating $10^8$ distinct tasks. We have empirically shown that the proposed environments can scale up to $2^{13}$ parallel instances on the GPU, reaching tens of millions of steps per second.
|
Alexander Nikulin · Vladislav Kurenkov · Ilya Zisman · Viacheslav Sinii · Artem Agarkov · Sergey Kolesnikov 🔗 |
-
|
Progressively Efficient Communication
(
Poster
)
>
link
The ability to rapidly acquire knowledge from humans is a fundamental skill for AI assistants. Traditional frameworks like imitation and reinforcement learning employ fixed, low-level communication protocols, making them inefficientfor teaching complex tasks. In contrast, humans are capable of communicatingnuanced ideas with progressive efficiency by establishing shared vocabularieswith others and expanding those vocabularies with increasingly abstract words. Mimicking this phenomenon in human communication, we introduce a novel learning framework named Communication-Efficient Interactive Learning (CEIL).By equipping a learning agent with a rich, dynamic language and an intrinsic motivation to communicate with minimal effort, CEIL leads to emergence of a human-like pattern where the learner and the teacher communicate more efficientlyover time by exchanging increasingly more abstract intentions. CEIL demonstrates impressive learning efficiency on a 2D MineCraft domain featuring long-horizondecision-making tasks. Especially, it performs robustly with teachers modeled after human pragmatic communication behavior. |
Khanh Nguyen · Ruijie Zheng · Hal Daumé III · Furong Huang · Karthik Narasimhan 🔗 |
-
|
Towards a General Framework for Continual Learning with Pre-training
(
Poster
)
>
link
In this work, we present a general framework for continual learning of sequentially arrived tasks with the use of pre-training, which has emerged as a promising direction for artificial intelligence systems to accommodate real-world dynamics. From a theoretical perspective, we decompose its objective into three hierarchical components, including within-task prediction, task-identity inference, and task-adaptive prediction. Then we propose an innovative approach to explicitly optimize these components with parameter-efficient fine-tuning (PEFT) techniques and representation statistics. We empirically demonstrate the superiority and generality of our approach in downstream continual learning, and further explore the applicability of PEFT techniques in upstream continual learning. We also discuss the biological basis of the proposed framework with recent advances in neuroscience. |
Liyuan Wang · Jingyi Xie · Xingxing Zhang · Hang Su · Jun Zhu 🔗 |
-
|
Intrinsically Motivated Social Play in Virtual Infants
(
Poster
)
>
link
Infants explore their complex physical and social environment in an organized way. To gain insight into what intrinsic motivations may help structure this exploration, we create a virtual infant agent and place it in a developmentally-inspired 3D environment with no external rewards. The environment has a virtual caregiver agent with the capability to interact contingently with the infant agent in ways that resemble play. We test intrinsic reward functions that are similar to motivations that have been proposed to drive exploration in humans: surprise, uncertainty, novelty, and learning progress. The reward functions that are proxies for novelty and uncertainty are the most successful in generating diverse experiences and activating the environment contingencies. We also find that learning a world model in the presence of an attentive caregiver helps the infant agent learn how to predict scenarios with challenging social and physical dynamics. Our findings provide insight into how curiosity-like intrinsic rewards and contingent social interaction lead to social behavior and the creation of a robust predictive world model. |
Chris Doyle · Sarah Shader · Michelle Lau · Megumi Sano · Dan Yamins · Nick Haber 🔗 |
-
|
Why Open-Ended Agency Should be Formalized on Hierarchical Empowerment-Gain Maximization
(
Poster
)
>
link
We argue that reward-maximization is insufficient as an objective for open-ended agency due to the complexity of the control problems. Instead, we argue that the intrinsic motivation metric of hierarchical empowerment might be particularly powerful for generating goals for life-long agents. |
Thomas Ringstrom 🔗 |
-
|
Voyager: An Open-Ended Embodied Agent with Large Language Models
(
Poster
)
>
link
We introduce Voyager, the first LLM-powered embodied lifelong learning agent in an open-ended world that continuously explores, acquires diverse skills, and makes novel discoveries without human intervention in Minecraft. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent’s capability rapidly and alleviates catastrophic forgetting. Empirically, Voyager demonstrates strong in-context lifelong learning capabilities. It outperforms prior SOTA by obtaining 3.1x more unique items, unlocking tech tree milestones up to 15.3x faster, and traveling 2.3x longer distances. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. |
Guanzhi Wang · Yuqi Xie · Yunfan Jiang · Ajay Mandlekar · Chaowei Xiao · Yuke Zhu · Linxi Fan · Animashree Anandkumar 🔗 |
-
|
High-fidelity social learning via shared episodic memories improves collaborative foraging
(
Poster
)
>
link
Social learning, a cornerstone of cultural evolution, allows individuals to acquire knowledge by observing and imitating others. Central to its efficacy is episodic memory, which records specific behavioral sequences to facilitate learning. This study examines their interrelation in the context of collaborative foraging. Specifically, we examine how variations in the frequency and fidelity of social learning impact collaborative foraging, and how the length of behavioral sequences preserved in agents’ episodic memory modulates these factors. To this end, we deploy Sequential Episodic Control agents capable of sharing among them behavioral sequences stored in their episodic memory. Our findings indicate that high-frequency, high-fidelity social learning promotes more distributed and efficient resource collection, a benefit that remains consistent regardless of the length of the shared episodic memories. In contrast, low-fidelity social learning shows no advantages over non-social learning in terms of resource acquisition. In addition, storing and disseminating of longer episodic memories contribute to enhanced performance up to a certain threshold, beyond which increased memory capacity does not yield further benefits. Our findings emphasize the crucial role of high-fidelity social learning in collaborative foraging, and illuminate the intricate relationship between episodic memory capacity and the quality and frequency of social learning. This work aims to highlight the potential of neuro-computational models like episodic control algorithms in understanding social learning and offers a new perspective for investigating the cognitive mechanisms underlying open-ended cultural evolution. |
Ismael T. Freire · Paul Verschure 🔗 |
-
|
Imprinting in autonomous artificial agents using deep reinforcement learning
(
Poster
)
>
link
Imprinting is a common survival strategy in which an animal learns a lasting preference for its parents and siblings early in life. To date, however, the origins and computational foundations of imprinting have not been formally established. What learning mechanisms generate imprinting behavior in newborn animals? Here, we used deep reinforcement learning and intrinsic motivation (curiosity), two learning mechanisms deeply rooted in psychology and neuroscience, to build autonomous artificial agents that imprint. When we raised our artificial agents together in the same environment, akin to the early social experiences of newborn animals, the agents spontaneously developed imprinting behavior. Our results provide a pixels-to-actions computational model of animal imprinting. We show that domain-general learning mechanisms—deep reinforcement learning and intrinsic motivation—are sufficient for embodied agents to rapidly learn core social behaviors from unsupervised natural experience. |
Donsuk Lee · Samantha Wood · Justin Wood 🔗 |
-
|
Neurobehavior of exploring AI agents
(
Poster
)
>
link
We study intrinsically motivated exploration by artificially intelligent (AI) agents in animal-inspired settings. We construct virtual environments that are 3D, vision-based, physics-simulated, and based on two established animal assays: labyrinth exploration, and novel object interaction. We assess Plan2Explore (P2E), a leading model-based, intrinsically motivated deep reinforcement learning agent, in these environments. We characterize and compare the behavior of the AI agents to animal behavior, using measures devised for animal neuroethology. P2E exhibits some similarities to animal behavior, but is dramatically less efficient than mice at labyrinth exploration. We further characterize the neural dynamics associated with world modeling in the novel-object assay. We identify latent neural population activity axes linearly associated with representing object color and proximity. These results identify areas of improvement for existing AI agents, and make strides toward understanding the learned neural dynamics that guide their behavior. |
Isaac Kauvar · Chris Doyle · Nick Haber 🔗 |
-
|
Reconciling Spatial and Temporal Abstractions for Goal Representation
(
Poster
)
>
link
Goal representation affects the performance of Hierarchical Reinforcement Learning (HRL) algorithms by decomposing complex problems into easier subtasks. Recent studies show that representations that preserve temporally abstract environment dynamics are successful in solving difficult problems with theoretical guarantees for optimality. These methods however cannot scale to tasks where environment dynamics increase in complexity. On the other hand, other efforts have tried to use spatial abstraction to mitigate the previous issues. Their limitations include scalability to high dimensional environments and dependency on prior knowledge.In this work, we propose a novel three-layer HRL algorithm that introduces, at different levels of the hierarchy, both a spatial and a temporal goal abstraction. We provide a theoretical study of the regret bounds of the learned policies. We evaluate the approach on complex continuous control tasks, demonstrating the effectiveness of spatial and temporal abstractions learned by this approach. |
Mehdi Zadem · Sergio Mover · Sao Mai Nguyen 🔗 |
-
|
What can AI Learn from Human Exploration? Intrinsically-Motivated Humans and Agents in Open-World Exploration
(
Poster
)
>
link
What drives exploration? Understanding intrinsic motivation is a long-standing question in both cognitive science and artificial intelligence (AI); numerous exploration objectives have been proposed and tested in human experiments and used to train reinforcement learning (RL) agents. However, experiments in the former are often in simplistic environments that do not capture the complexity of real world exploration. On the other hand, experiments in the latter use more complex environments, yet the trained RL agents fail to come close to human exploration efficiency. To study this gap, we propose a framework for directly comparing human and agent exploration in an open-ended environment, Crafter. We study how well commonly-proposed information theoretic objectives for intrinsic motivation relate to actual human and agent behaviours, finding that human exploration consistently shows a significant positive correlation with Entropy, Information Gain, and Empowerment. Surprisingly, we find that intrinsically-motivated RL agent exploration does not show the same significant correlation consistently, despite being designed to optimize objectives that approximate Entropy or Information Gain. In a preliminary analysis of verbalizations, we find that children's verbalizations of goals positively correlates strongly with Empowerment, suggesting that goal-setting may be an important aspect of efficient exploration. |
Yuqing Du · Eliza Kosoy · Alyssa L Dayan · Maria Rufova · Pieter Abbeel · Alison Gopnik 🔗 |
-
|
Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity
(
Poster
)
>
link
Intrinsic reward function is widely used to improve the exploration in reinforcement learning. We first examine the conditions and causes of catastrophic forgetting of the intrinsic reward function, and propose a new method, FarCuriosity, inspired by how humans and non-human animals learn. The method depends on fragmentation and recall: an agent fragments an environment based on surprisal signals, and uses different local curiosity modules (prediction-based intrinsic reward functions) for each division so that modules are not trained on the entire environment. At fragmentation event, the agent stores the current module in long-term memory (LTM) and either initializes a new module or recalls a previously stored module based on its match with the current state. With fragmentation and recall, FarCuriosity achieves less forgetting and better overall performance in games with varied and heterogeneous environments in the Atari benchmark suite of tasks. Thus, this work highlights the problem of catastrophic forgetting in prediction-based curiosity methods and proposes a first solution. |
Jaedong Hwang · Zhang-Wei Hong · Eric Chen · Akhilan Boopathy · Pulkit Agrawal · Ila Fiete 🔗 |
-
|
FOCUS: Object-Centric World Models for Robotic Manipulation
(
Poster
)
>
link
Understanding the world in terms of objects and the possible interactions with them is an important cognition ability, especially in robotic manipulation. However, learning a structured world model that allows controlling the agent accurately remains a challenge. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. The learned representation makes it possible to provide the agent with an object-centric exploration mechanism, which encourages the agent to interact with objects and discover useful interactions. We apply FOCUS in several robotic manipulation settings where we show how our method fosters interactions such as reaching, moving, and rotating the objects in the environment. We further show how this ability to autonomously interact with objects can be used to quickly solve a given task using reinforcement learning with sparse rewards. |
Stefano Ferraro · Pietro Mazzaglia · Tim Verbelen · Bart Dhoedt 🔗 |
-
|
DeepThought: an architecture for autonomous self-motivated systems
(
Poster
)
>
link
The ability of large language models (LLMs) to engage in credible dialogues with humans, taking into account the training data and the context of the conversation, raised discussions about their ability to exhibit intrinsic motivations, agency, or even some degree of consciousness. We argue that the internal architecture of LLMs and their finite and volatile state cannot support any of these properties. By combining insights from complementary learning systems and global neuronal workspace theories, we propose to integrate LLMs and other deep learning systems into a new architecture that is able to exhibit properties akin to agency, self-motivation and even, more speculatively, some features of consciousness. |
Arlindo L Oliveira · Tiago Domingos · Mario Figueiredo · Pedro Lima 🔗 |
-
|
Regularity as Intrinsic Reward for Free Play
(
Poster
)
>
link
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model’s epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks. |
Cansu Sancaktar · Justus Piater · Georg Martius 🔗 |
-
|
Children prioritize purely exploratory actions in observe-vs.-bet tasks
(
Poster
)
>
link
In reinforcement learning, agents often need to make decisions between selecting actions that are familiar and have previously yielded positive results (exploitation), and seeking new information that could allow them to uncover more effective actions (exploration). Understanding how humans learn their sophisticated exploratory strategies over the course of their development remains an open question for both computer and cognitive science. Existing studies typically use classic bandit or gridworld tasks that confound the rewarding with the informative characteristics of an outcome. In this study, we adopt an observe-vs.-bet task that separates “pure exploration” from “pure exploitation” by giving participants the option to either observe an instance of an outcome and receive no reward, or to bet on one action that is eventually rewarding, but offers no immediate feedback. We collected data from 33 five-to-seven-year-old children who completed the task at one of three different bias levels. We compared how children performed with both approximate solutions to the partially-observable Markov decision process and meta-reinforcement learning models that was meta trained on the same decision making task across different probability levels. We found that the children observe significantly more than the two classes of algorithms and qualitatively more than adults in similar tasks. We then quantified how children’s policies differ between the different efficacy levels by fitting probabilistic programming models and by calculating the likelihood of the children’s actions under the task-driven model. The fitted parameters of the behavioral model as well as the direction of the deviation from neural network policies demonstrate that the primary way children adapt their behavior is by changing the amount of time that they bet on the most-recently-observed arm while maintaining a consistent frequency of observations across bias levels, suggesting both that children model the causal structure of the environment and a “hedging behavior” that would be impossible to detect in standard bandit tasks. The results shed light on how children reason about reward and information, providing an important developmental benchmark that can help shape our understanding of human behavior that we hope to investigate further using recently-developed neural network reinforcement learning models on reasoning about information and reward. |
Eunice Yiu · Kai Sandbrink · Alison Gopnik 🔗 |
-
|
Skill-Based Reinforcement Learning with Intrinsic Reward Matching
(
Poster
)
>
link
While unsupervised skill discovery has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill pretraining and downstream, task-aware finetuning. We present Intrinsic Reward Matching (IRM), which unifies these two phases of learning via the $\textit{skill discriminator}$, a pretraining model component often discarded during finetuning. Conventional approaches finetune pretrained agents directly at the policy level, often relying on expensive environment rollouts to empirically determine the optimal skill. However, often the most concise yet complete description of a task is the reward function itself, and skill learning methods learn an $\textit{intrinsic}$ reward function via the discriminator that corresponds to the skill policy. We propose to leverage the skill discriminator to $\textit{match}$ the intrinsic and downstream task rewards and determine the optimal skill for an unseen task without environment samples on a Fetch tabletop manipulation task suite.
|
Ademi Adeniji · Amber Xie · Pieter Abbeel 🔗 |
-
|
From Child's Play to AI: Insights into Automated Causal Curriculum Learning
(
Poster
)
>
link
We study how reinforcement learning algorithms and children develop their causal curriculum to achieve a challenging goal that is not solvable at first. Adopting the Procgen environments that comprise various tasks as challenging goals, we found that 5- to 7-year-old children actively used their current level progress to determine their next step in the curriculum and made improvements to solving the goal during this process. This suggests that children treat their level progress as an intrinsic reward, and are motivated to master easier levels in order to do better at the more difficult one, even without explicit reward. To evaluate RL agents, we exposed them to the same demanding Procgen environments as children and employed several curriculum learning methodologies. Our results demonstrate that RL agents that emulate children by incorporating level progress as an intrinsic reward signal exhibit greater stability and are more likely to converge during training, compared to RL agents solely reliant on extrinsic reward signals for game-solving. Curriculum learning may also offer a significant reduction in the number of frames needed to solve a target environment. Taken together, our human-inspired findings suggest a potential path forward for addressing catastrophic forgetting or domain shift during curriculum learning in RL agents. |
Annya Dahmani · Eunice Yiu · Tabitha Lee · Nan Rosemary Ke · Oliver Kroemer · Alison Gopnik 🔗 |
-
|
Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data
(
Poster
)
>
link
Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem: learning to reach any goal without human-specified rewards or labels. Despite the seeming appeal, little (if any) prior work has demonstrated how self-supervised RL methods can be practically deployed on robotic systems. By first studying a challenging simulated version of this task, we discover design decisions about architectures and hyperparameters that increase the success rate by $2 \times$. These findings lay the groundwork for our main result: we demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks, with tasks being specified by a single goal image provided after training.
|
Chongyi Zheng · Benjamin Eysenbach · Homer Walke · Patrick Yin · Kuan Fang · Russ Salakhutdinov · Sergey Levine 🔗 |
-
|
Enhancing Understanding in Generative Agents through Active Inquiring
(
Poster
)
>
link
As artificial intelligence advances, Large Language Models (LLMs) have evolved beyond being just tools, becoming more like human-like agents that can converse, reflect, plan, and set goals. However, these models still struggle with open-ended question answering and often fail to understand unfamiliar scenarios quickly. To address this, we ask: how do humans manage strange situations so effectively? We believe it’s largely due to our natural instinct for curiosity and a built-in desire to predict the future and seek explanations when those predictions don’t align with reality. Unlike humans, LLMs typically accept information passively without an inherent desire to question or doubt, which could be why they struggle to understand new situations.Focusing on this, our study explores the possibility of equipping LLM-agents with human-like curiosity. Can these models move from being passive processors to active seekers of understanding, reflecting human behaviors? And can this adaptation benefit them as it does humans? To explore this, we introduce an innovative experimental framework where generative agents navigate through strange and unfamiliar situations, and their understanding is then assessed through interview questions about those situations. Initial results show notable improvements when models are equipped with traits of surprise and inquiry compared to those without. This research is a step towards creating more human-like agents and highlights the potential benefits of integrating human-like traits in models. |
Jiaxin Ge · Kaiya Zhao · Manuel Cortes · Jovana Kondic · Shuying Luo · Michelangelo Naim · Andrew Ahn · Guangyu Robert Yang 🔗 |
-
|
Codeplay: Autotelic Learning through Collaborative Self-Play in Programming Environments
(
Poster
)
>
link
Autotelic learning is the training setup where agents learn by setting their own goals and trying to achieve them. However, creatively generating freeform goals is challenging for autotelic agents. We present Codeplay, an algorithm casting autotelic learning as a game between a Setter agent and a Solver agent, where the Setter generates programming puzzles of appropriate difficulty and novelty for the solver and the Solver learns to achieve them. Early experiments with the Setter demonstrates one can effectively control the tradeoff between difficulty of a puzzle and its novelty by tuning the reward of the Setter, a code language model finetuned with deep reinforcement learning. |
Laetitia Teodorescu · Cédric Colas · Matthew Bowers · Thomas Carta · Pierre-Yves Oudeyer 🔗 |
-
|
Learning Diverse Skills for Local Navigation under Multi-constraint Optimality
(
Poster
)
>
link
Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal. |
Jin Cheng · Marin Vlastelica Pogančić · Pavel Kolev · Chenhao Li · Georg Martius 🔗 |
-
|
Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
(
Poster
)
>
link
Both surprise-minimizing and surprise-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method can perform well across all entropy regimes. In an effort to find a single surprise-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective depending on the entropy conditions it faces, by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit which captures the ability of the agent to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes. |
Adriana Hugessen · Roger Creus Castanyer · Glen Berseth 🔗 |
-
|
Modeling habituation in infants and adults using rational curiosity over perceptual embeddings
(
Poster
)
>
link
From birth, human infants engage in intrinsically motivated, open-ended learning, mainly by deciding what to attend to and for how long. Yet, existing formal models of the drivers of looking are very limited in scope. To address this, we present a new version of the Rational Action, Noisy Choice for Habituation (RANCH) model. This version of RANCH is a stimulus-computable, rational learning model that decides how long to look at sequences of stimuli based on expected information gain (EIG). The model captures key patterns of looking time documented in the literature, habituation and dishabituation. We evaluate RANCH quantitatively using large datasets from adult and infant looking time experiments. We argue that looking time in our experiments is well described by RANCH, and that RANCH is a general, interpretable and modifiable framework for the rational analyses of intrinsically motivated learning by looking. |
Gal Raz · Anjie Cao · Rebecca Saxe · Michael C Frank 🔗 |
-
|
Generating Human-Like Goals by Synthesizing Reward-Producing Programs
(
Poster
)
>
link
Humans show a remarkable capacity to generate novel goals, for learning and play alike, and modeling this human capacity would be a valuable step toward more generally-capable artificial agents. We describe a computational model for generating novel human-like goals represented in a domain-specific language (DSL). We learn a ‘human-likeness’ fitness function over expressions in this DSL from a small (<100 game) human dataset collected in an online experiment. We then use a Quality-Diversity (QD) approach to generate a variety of human-like games with different characteristics and high fitness. We demonstrate that our method can generate synthetic games that are syntactically coherent under the DSL, semantically sensible with respect to environmental objects and their affordances, but distinct from human games in the training set. We discuss key components of our model and its current shortcomings, in the hope that this work helps inspire progress toward self-directed agents with human-like goals. |
Guy Davidson · Graham Todd · Todd Gureckis · Julian Togelius · Brenden Lake 🔗 |
-
|
Generative Intrinsic Optimization: Intrisic Control with Model Learning
(
Poster
)
>
link
Future sequence represents the outcome after executing the action into the environment. When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a variational approach to jointly learn the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. Integrated into a policy iteration scheme, our approach guarantees convergence to the optimal policy. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making. |
Jianfei Ma 🔗 |
-
|
Learning Interpretable Libraries by Compressing and Documenting Code
(
Poster
)
>
link
While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability, we find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing, scene reasoning, and graphics composition. Compared to existing neural and symbolic methods—including the state-of-the-art library learning algorithm DreamCoder—LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge. |
Gabriel Grand · Catherine Wong · Matthew Bowers · Theo X. Olausson · Muxin Liu · Josh Tenenbaum · Jacob Andreas 🔗 |