Timezone: »
The concept of information plays a fundamental role in modern theories of cognition, in particular as regards perception, learning and behavior. However, the formal links between information theory and cognitive neuroscience remain elusive, and information theoretic measures are all too frequently misapplied.
In this tutorial I present a principled overview of some recent links between Shannon's theory of communication and statistical learning theory, and then put these in a more general framework of information theory of perception and control. I begin with the well-known links between statistical inference and information, from simple hypothesis testing, parameter estimation and the concept of sufficient statistics. An information theoretic generalization of minimal sufficient statistics leads to a natural optimization problem; i.e., the information bottleneck principle (IB), which is directly connected to classical models of communication with side information. While the IB optimization problem is generally non-convex, it is efficiently solvable for multivariate Gaussian variables. This special case was recently generalized using the Kernel trick to a wide class of dimensionality reduction problems, similar to Kernel-CCA. This version makes the information bottleneck method completely applicable for a wide range of practical problems. I will discuss the advantages of these algorithms over K-CCA and describe the importance of the information tradeoff (information curve) for hierarchical data representation and feature extraction. I will also discuss some finite sample properties and generalization bounds for this method.
In the second part of the tutorial I begin with the (Kelly) gambling problem and show that it can be extended to a general computational theory of value-information tradeoff, for both MDP and POMDP settings. This will provide a unified theoretical framework for optimal control and information seeking algorithms, and one that has the potential to be a principled model of perception-action cycles. I will then discuss the concept of predictive information and show how it provides useful bounds on the information flow in perception and control, and how it can be applied in robotics and neuroscience.
Author Information
Naftali Tishby (The Hebrew University Jerusalem)
Naftali Tishby, is a professor of computer science and the director of the Interdisciplinary Center for Neural Computation (ICNC) at the Hebrew university of Jerusalem. He received his Ph.D. in theoretical physics from the Hebrew University and was a research staff member at MIT and Bell Labs from 1985 to 1991. He was also a visiting professor at Princeton NECI, the University of Pennsylvania and the University of California at Santa Barbara. Dr. Tishby is a leader of machine learning research and computational neuroscience. He was among the first to introduce methods from statistical physics into learning theory, and dynamical systems techniques in speech processing. His current research is at the interface between computer science, statistical physics and computational neuroscience and concerns the foundations of biological information processing and the connections between dynamics and information.
More from the Same Authors
-
2017 : How do the Deep Learning layers converge to the Information Bottleneck limit by Stochastic Gradient Descent? »
Naftali Tishby -
2016 : Principles and Algorithms for Self-Motivated Behaviour »
Naftali Tishby -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2013 Workshop: Planning with Information Constraints for Control, Reinforcement Learning, Computational Neuroscience, Robotics and Games. »
Hilbert J Kappen · Naftali Tishby · Jan Peters · Evangelos Theodorou · David H Wolpert · Pedro Ortega -
2012 Workshop: Information in Perception and Action »
Naftali Tishby · Daniel Polani · Tobias Jung -
2010 Poster: Tight Sample Complexity of Large-Margin Learning »
Sivan Sabato · Nati Srebro · Naftali Tishby -
2008 Workshop: Principled Theoretical Frameworks for the Perception-Action Cycle »
Daniel Polani · Naftali Tishby -
2008 Mini Symposium: Principled Theoretical Frameworks for the Perception-Action Cycle »
Daniel Polani · Naftali Tishby -
2008 Poster: On the Reliability of Clustering Stability in the Large Sample Regime »
Ohad Shamir · Naftali Tishby -
2008 Spotlight: On the Reliability of Clustering Stability in the Large Sample Regime »
Ohad Shamir · Naftali Tishby -
2007 Oral: Cluster Stability for Finite Samples »
Ohad Shamir · Naftali Tishby -
2007 Poster: Cluster Stability for Finite Samples »
Ohad Shamir · Naftali Tishby -
2006 Workshop: Revealing Hidden Elements of Dynamical Systems »
Naftali Tishby -
2006 Poster: Information Bottleneck for Non Co-Occurrence Data »
Yevgeny Seldin · Noam Slonim · Naftali Tishby