Timezone: »

Invited Talk (Posner Lecture)
Learning About Sensorimotor Data
Richard Sutton

Tue Dec 13 12:30 AM -- 01:20 AM (PST) @ None

Temporal-difference (TD) learning of reward predictions underlies both reinforcement-learning algorithms and the standard dopamine model of reward-based learning in the brain. This confluence of computational and neuroscientific ideas is perhaps the most successful since the Hebb synapse. Can it be extended beyond reward? The brain certainly predicts many things other than reward---such as in a forward model of the consequences of various ways of behaving---and TD methods can be used to make these predictions. The idea and advantages of using TD methods to learn large numbers of predictions about many states and stimuli, in parallel, have been apparent since the 1990s, but technical issues have prevented this vision from being practically implemented...until now. A key breakthrough was the development of a new family of gradient-TD methods, introduced at NIPS in 2008 (by Maei, Szepesvari, and myself). Using these methods, and other ideas, we are now able to learn thousands of non-reward predictions in real-time at 10Hz from a single sensorimotor data stream from a physical robot. These predictions are temporally extended (ranging up to tens of seconds of anticipation), goal oriented, and policy contingent. The new algorithms enable learning to be off-policy and in parallel, resulting in dramatic increases in the amount that can be learned in a given amount of time. Our effective learning rate scales linearly with computational resources. On a consumer laptop we can learn thousands of predictions in real-time. On a larger computer, or on a comparable laptop in a few years, the same methods could learn millions of meaningful predictions about different alternate ways of behaving. These predictions in aggregate constitute a rich detailed model of the world that can support planning methods such as approximate dynamic programming.

Author Information

Rich Sutton (DeepMind, U Alberta)

Richard S. Sutton is a professor and iCORE chair in the department of computing science at the University of Alberta. He is a fellow of the Association for the Advancement of Artificial Intelligence and co-author of the textbook "Reinforcement Learning: An Introduction" from MIT Press. Before joining the University of Alberta in 2003, he worked in industry at AT&T and GTE Labs, and in academia at the University of Massachusetts. He received a PhD in computer science from the University of Massachusetts in 1984 and a BA in psychology from Stanford University in 1978. Rich's research interests center on the learning problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is also interested in animal learning psychology, in connectionist networks, and generally in systems that continually improve their representations and models of the world.

More from the Same Authors