Information Theory in Learning and Control
Naftali Tishby

Mon Dec 12th 04:00 -- 06:00 PM @ Manuel de Falla
Event URL: http://www.cs.huji.ac.il/~tishby/NIPS2011-Tutorial »

The concept of information plays a fundamental role in modern theories of cognition, in particular as regards perception, learning and behavior. However, the formal links between information theory and cognitive neuroscience remain elusive, and information theoretic measures are all too frequently misapplied.

In this tutorial I present a principled overview of some recent links between Shannon's theory of communication and statistical learning theory, and then put these in a more general framework of information theory of perception and control. I begin with the well-known links between statistical inference and information, from simple hypothesis testing, parameter estimation and the concept of sufficient statistics. An information theoretic generalization of minimal sufficient statistics leads to a natural optimization problem; i.e., the information bottleneck principle (IB), which is directly connected to classical models of communication with side information. While the IB optimization problem is generally non-convex, it is efficiently solvable for multivariate Gaussian variables. This special case was recently generalized using the Kernel trick to a wide class of dimensionality reduction problems, similar to Kernel-CCA. This version makes the information bottleneck method completely applicable for a wide range of practical problems. I will discuss the advantages of these algorithms over K-CCA and describe the importance of the information tradeoff (information curve) for hierarchical data representation and feature extraction. I will also discuss some finite sample properties and generalization bounds for this method.

In the second part of the tutorial I begin with the (Kelly) gambling problem and show that it can be extended to a general computational theory of value-information tradeoff, for both MDP and POMDP settings. This will provide a unified theoretical framework for optimal control and information seeking algorithms, and one that has the potential to be a principled model of perception-action cycles. I will then discuss the concept of predictive information and show how it provides useful bounds on the information flow in perception and control, and how it can be applied in robotics and neuroscience.

Author Information

Naftali Tishby (The Hebrew University Jerusalem)

Naftali Tishby, is a professor of computer science and the director of the Interdisciplinary Center for Neural Computation (ICNC) at the Hebrew university of Jerusalem. He received his Ph.D. in theoretical physics from the Hebrew University and was a research staff member at MIT and Bell Labs from 1985 to 1991. He was also a visiting professor at Princeton NECI, the University of Pennsylvania and the University of California at Santa Barbara. Dr. Tishby is a leader of machine learning research and computational neuroscience. He was among the first to introduce methods from statistical physics into learning theory, and dynamical systems techniques in speech processing. His current research is at the interface between computer science, statistical physics and computational neuroscience and concerns the foundations of biological information processing and the connections between dynamics and information.

More from the Same Authors