Skip to yearly menu bar Skip to main content


Tutorial

Information Theory in Learning and Control

Naftali Tishby

Manuel de Falla

Abstract:

The concept of information plays a fundamental role in modern theories of cognition, in particular as regards perception, learning and behavior. However, the formal links between information theory and cognitive neuroscience remain elusive, and information theoretic measures are all too frequently misapplied.

In this tutorial I present a principled overview of some recent links between Shannon's theory of communication and statistical learning theory, and then put these in a more general framework of information theory of perception and control. I begin with the well-known links between statistical inference and information, from simple hypothesis testing, parameter estimation and the concept of sufficient statistics. An information theoretic generalization of minimal sufficient statistics leads to a natural optimization problem; i.e., the information bottleneck principle (IB), which is directly connected to classical models of communication with side information. While the IB optimization problem is generally non-convex, it is efficiently solvable for multivariate Gaussian variables. This special case was recently generalized using the Kernel trick to a wide class of dimensionality reduction problems, similar to Kernel-CCA. This version makes the information bottleneck method completely applicable for a wide range of practical problems. I will discuss the advantages of these algorithms over K-CCA and describe the importance of the information tradeoff (information curve) for hierarchical data representation and feature extraction. I will also discuss some finite sample properties and generalization bounds for this method.

In the second part of the tutorial I begin with the (Kelly) gambling problem and show that it can be extended to a general computational theory of value-information tradeoff, for both MDP and POMDP settings. This will provide a unified theoretical framework for optimal control and information seeking algorithms, and one that has the potential to be a principled model of perception-action cycles. I will then discuss the concept of predictive information and show how it provides useful bounds on the information flow in perception and control, and how it can be applied in robotics and neuroscience.

Chat is not available.