Invited Talk
Workshop: Metacognition in the Age of AI: Challenges and Opportunities

Credit Assignment & Meta-Learning in a Single Lifelong Trial

Jürgen Schmidhuber


Most current artificial reinforcement learning (RL) agents are trained under the assumption of repeatable trials, and are reset at the beginning of each trial. Humans, however, are never reset. Instead, they are allowed to discover computable patterns across trials, e.g.: in every third trial, go left to obtain reward, otherwise go right. General RL (sometimes called AGI) must assume a single lifelong trial which may or may not include identifiable sub-trials. General RL must also explicitly take into account that policy changes in early life may affect properties of later sub-trials and policy changes. In particular, General RL must take into account recursively that early meta-meta-learning is setting the stage for later meta-learning which is setting the stage for later learning etc. Most popular RL mechanisms, however, ignore such lifelong credit assignment chains. Exceptions are the success story algorithm (1990s), AIXI (2000s), and the mathematically optimal Gödel Machine (2003).