Many cognitive and neural systems can be described in terms of compression and transmission of information given bounded resources. While information theory, as a principled mathematical framework for characterizing such systems, has been widely applied in neuroscience and machine learning, its role in understanding cognition has traditionally been contested. This traditional view has been changing in recent years, with growing evidence that information-theoretic optimality principles underlie a wide range of cognitive functions, including perception, working memory, language, and decision making. In parallel, there has also been a surge of contemporary information-theoretic approaches in machine learning, enabling large-scale neural-network implementation of information-theoretic models.
These scientific and technological developments open up new avenues for progress toward an integrative computational theory of human and artificial cognition, by leveraging information-theoretic principles as bridges between various cognitive functions and neural representations. This workshop aims to explore these new research directions and bring together researchers from machine learning, cognitive science, neuroscience, linguistics, economics, and potentially other fields, who are interested in integrating information-theoretic approaches that have thus far been studied largely independently of each other. In particular, we aim to discuss questions and exchange ideas along the following directions:
- Understanding human cognition: To what extent can information theoretic principles advance the understanding of human cognition and its emergence from neural systems? What are the key challenges for future research in information theory and cognition? How might tools from machine learning help overcome these challenges? Addressing such questions could lead to progress in computational models that integrate multiple cognitive functions and cross Marr’s levels of analysis.
- Improving AI agents and human-AI cooperation: Given empirical evidence that information theoretic principles may underlie a range of human cognitive functions, how can such principles guide artificial agents toward human-like cognition? How might these principles facilitate human-AI communication and cooperation? Can this help agents learn faster with less data? Addressing such questions could lead to progress in developing better human-like AI systems.
Sat 6:30 a.m. - 6:40 a.m.
|
Opening Remarks
SlidesLive Video » |
Noga Zaslavsky 🔗 |
Sat 6:40 a.m. - 7:10 a.m.
|
How behavioral and evolutionary constraints sculpt early visual processing
(
Invited talk
)
SlidesLive Video » Biological systems must selectively encode partial information about the environment, as dictated by the capacity constraints at work in all living organisms. For example, we cannot see every feature of the light field that reaches our eyes; temporal resolution is limited by transmission noise and delays, and spatial resolution is limited by the finite number of photoreceptors and output cells in the retina. Classical efficient coding theory describes how sensory systems can maximize information transmission given such capacity constraints, but it treats all input features equally. Not all inputs are, however, of equal value to the organism. Our work quantifies whether and how the brain selectively encodes stimulus features, specifically predictive features, that are most useful for fast and effective movements. We have shown that efficient predictive computation starts at the earliest stages of the visual system, in the retina. We borrow techniques from statistical physics and information theory to assess how we get terrific, predictive vision from these imperfect (lagged and noisy) component parts. In broader terms, we aim to build a more complete theory of efficient encoding in the brain, and along the way have found some intriguing connections between formal notions of coarse graining in biology and physics. |
Stephanie Palmer 🔗 |
Sat 7:10 a.m. - 7:18 a.m.
|
Neural networks learn an environment's geometry in latent space by performing predictive coding on visual scenes
(
Oral
)
link »
SlidesLive Video » Humans navigate complex environments using only visual cues and self-motion. Mapping an environment is an essential task for navigation within a physical space; neuroscientists and cognitive scientists also postulate that mapping algorithms underlie cognition by mapping concepts, memories, and other nonspatial variables. Despite the broad importance of mapping algorithms in neuroscience, it is not clear how neural networks can build spatial maps exclusively from sensor observations without access to the environment’s coordinates through reinforcement learning or supervised learning. Path integration, for example, implicitly needs the environment’s coordinates to predict how past velocities translate into the current position. Here we show that predicting sensory observations—called predictive coding—extends path integration from implicitly requiring the environment’s coordinates. Specifically, a neural network constructs an environmental map in its latent space by predicting visual input. As the network traverses complex environments in Minecraft, spatial proximity between object positions affects distances in the network's latent space. The relationship depends on the uniqueness of the environment’s visual scene as measured by the mutual information between the images and spatial position. Predictive coding extends to any sequential dataset. Observations from paths traversing a manifold can generate such sequential data. We anticipate neural networks that perform predictive coding identify the underlying manifold without requiring the manifold’s coordinates. |
James Gornet · Matt Thomson 🔗 |
Sat 7:20 a.m. - 7:50 a.m.
|
Information-based exploration under active inference
(
Invited talk
)
SlidesLive Video » We contend with conflicting objectives when interacting with their environment e.g., exploratory drives when the environment is unknown or exploitative to maximise some expected return. A widely studied proposition for understanding how to appropriately balance between these distinct imperatives is active inference. In this talk, I will introduce active inference – a neuroscience theory – which brings together perception and action under a single objective of minimising surprisal across time. Through T-maze simulations, I will illustrate how this single objective provides a way to balance information-based exploration and exploitation. Next, I will present our work on scaling up active inference to operate in complex, continuous state-spaces. For this, we propose using multiple forms of Monte-Carlo (MC) sampling to render (expected) surprisal computationally tractable. I will construct-validate this in a complex Animal-AI environment, where our agents can simulate the future, to evince reward-directed navigation – despite a temporary suspension of visual input. Lastly, I will extend this formulation to appropriately deal with volatile environments by introducing a preference-augmented (expected) surprisal objective. Using the FrozenLake environment, I will discuss different ways of encoding preferences and how they underwrite appropriate levels of arbitration between exploitation and exploration. |
Noor Sajid 🔗 |
Sat 7:50 a.m. - 7:58 a.m.
|
Compression supports low-dimensional representations of behavior across neural circuits
(
Oral
)
link »
SlidesLive Video »
Dimensionality reduction, a form of compression, can simplify representations of information to increase efficiency and reveal general patterns. Yet, this simplification also forfeits information, thereby reducing representational capacity. Hence, the brain may benefit from generating both compressed and uncompressed activity, and may do so in a heterogeneous manner across diverse neural circuits that represent low-level (sensory) or high-level (cognitive) stimuli. However, precisely how compression and representational capacity differ across the cortex remains unknown. Here we predict different levels of compression across regional circuits by using random walks on networks to model activity flow, and then we formulate rate-distortion functions, which are the basis of lossy compression. Using a large sample of youth ($n=1,040$), we test predictions in two ways: by measuring the dimensionality of spontaneous activity from sensorimotor to association cortex, and by assessing the representational capacity for 24 behaviors in neural circuits and 20 cognitive variables in recurrent neural networks. Our network theory of compression predicts the dimensionality and representational capacity of biological and artificial networks, thereby advancing understanding of how connectivity supports computational functions that involve compression.
|
Dale Zhou · Jason Kim · Adam Pines · Valerie Sydnor · David Roalf · John Detre · Ruben Gur · Raquel Gur · Theodore Satterthwaite · Danielle S Bassett 🔗 |
Sat 8:00 a.m. - 8:30 a.m.
|
Coffee Break + Posters
|
🔗 |
Sat 8:30 a.m. - 9:00 a.m.
|
Information theory, learning, & hyperbolic geometry in neural representations in the hippocampus
(
Invited talk
)
SlidesLive Video » |
Tatyana Sharpee 🔗 |
Sat 9:00 a.m. - 9:08 a.m.
|
Information-theoretic Neural Decoding Reproduces Several Laws of Human Behavior
(
Oral
)
link »
SlidesLive Video » Features of tasks and environments are often represented in the brain by neural firing rates. Representations must be decoded to enable downstream actions, and decoding takes time. We describe a toy model with a Poisson process encoder and an ideal observer Bayesian decoder, and show the decoding of rate-coded signals reproduces classic patterns of response time and accuracy observed in humans, including the Hick-Hyman Law, the Power Law of Learning, speed-accuracy trade-offs, and lognormally distributed response times. The decoder is equipped with a codebook, a prior distribution over signals, and an entropy stopping threshold. We argue that historical concerns of the applicability of such information-theoretic tools to neural and behavioral data arises from a confusion about the application of discrete-time coding techniques to continuous-time signals. |
S. Thomas Christie · Paul R Schrater 🔗 |
Sat 9:10 a.m. - 9:40 a.m.
|
Information is not enough
(
Invited talk
)
SlidesLive Video » The publication of Shannon’s ‘A Mathematical Theory of Communication’ (1948) has been described as “delayed-action bomb”. It reshaped psychology and neuroscience and has been credited as foundational to the field of cognitive science. Yet after the initial shockwave, the pace of new ideas in cognitive science emerging from the theory slowed dramatically. This trend has begun to reverse, as evidenced by this workshop. But what accounts for the stagnation, and what accounts for the recent change? I argue that information is not enough. Information is a resource or a constraint, but is not sufficient as a computational theory of intelligence. An important step forward for cognitive science came from the combination of information theory with expected utility theory (rate-distortion theory). More recent progress has been driven by the advent of principled approximation methods in computation. The combination of all of these ideas yields ‘information-theoretic computational rationality’, a powerful framework for understanding natural intelligence. |
Chris Sims 🔗 |
Sat 9:40 a.m. - 9:43 a.m.
|
Generalization and Translatability in Emergent Communication via Informational Constraints
(
Spotlight
)
link »
SlidesLive Video » Traditional emergent communication (EC) methods often fail to generalize to novel settings or align with representations of natural language. Here, we show how controlling the Information Bottleneck (IB) tradeoff between complexity and informativeness (a principle thought to guide human languages) helps to address both of these problems in EC. Using VQ-VIB, a recent method for training EC agents while controlling the IB tradeoff, we find that: (1) increasing pressure for informativeness, which encourages agents to develop a shared understanding beyond task-specific needs, leads to better generalization to more challenging tasks and novel inputs; (2) VQ-VIB agents develop an EC space that encodes some semantic similarities and facilitates open-domain communication, similar to word embeddings in natural language; and (3) when translating between English and EC, greater complexity leads to improved performance of teams of simulated English speakers and trained VQ-VIB listeners, but only up to a threshold corresponding to the English complexity. These results indicate the importance of informational constraints for improving self-play performance and human-agent interaction. |
Mycal Tucker · Roger Levy · Julie A Shah · Noga Zaslavsky 🔗 |
Sat 9:43 a.m. - 9:46 a.m.
|
Generalizing with overly complex representations
(
Spotlight
)
link »
SlidesLive Video » Representations enable cognitive systems to generalize from known experiences to the new ones. Simplicity of a representation has been linked to its generalization ability. Conventionally, simple representations are associated with a capacity to capture the structure in the data and rule out the noise. Representations with more flexibility than required to accommodate the structure of the target phenomenon, on the contrary, risk to catastrophically overfit the observed samples and fail to generalize to new observations. Here, I computationally test this idea by using a simple task of learning a representation to predict unseen features based on the observed ones. I simulate the process of learning a representation that has a lower, matching, or higher dimensionality than the phenomenon it accounts for. The results suggest that the representations of the highest dimensionality consistently generate the best out-of-sample predictions despite perfectly memorizing the training observations. These findings are in line with the recently described ``double descent” of generalization error -- an observation that many learning systems generalize best when overparameterized (when their representational capacity far exceeds the task requirements). |
Marina Dubova 🔗 |
Sat 9:46 a.m. - 9:49 a.m.
|
On the informativeness of supervision signals
(
Spotlight
)
link »
SlidesLive Video » Learning transferable representations by training a classifier is a well-established technique in deep learning (e.g. ImageNet pretraining), but there is a lack of theory to explain why this kind of task-specific pre-training should result in 'good' representations. We conduct an information-theoretic analysis of several commonly-used supervision signals to determine how they contribute to representation learning performance and how the dynamics are affected by training parameters like the number of labels, classes, and dimensions in the training dataset. We confirm these results empirically in a series of simulations and conduct a cost-benefit analysis to establish a tradeoff curve allowing users to optimize the cost of supervising representation learning. |
Ilia Sucholutsky · Raja Marjieh · Tom Griffiths 🔗 |
Sat 9:49 a.m. - 9:52 a.m.
|
A Theory of Unsupervised Translation for Understanding Animal Communication
(
Spotlight
)
link »
SlidesLive Video » Unsupervised translation generally refers to the challenging task of translating between two languages without parallel translations, i.e., from two separate monolingual corpora. In this work, we propose an information-theoretic framework of unsupervised translation that can be well suited even for the case where the source language is that of highly intelligent animals, such as whales, and the target language is a human language, such as English. We identify two conditions that combined allow for unsupervised translation: (1) there is access to an prior distribution over the target language that estimates the likelihood that a sentence was translated from the source language; and (2) most alterations of translations are deemed implausible (i.e., unlikely) by the prior. We then give an (inefficient) algorithm which, given access to the prior and enough unlabeled source examples as input, outputs a provably accurate translation function. Surprisingly, our analysis suggests that the amount of source data required (information theoretically) for unsupervised translation is not significantly greater than that of supervised translation, i.e., the standard case where one has parallel translated data for training. To support the viability of our theory, we propose a simplified probabilistic model of language: the random sub-tree language model, in which sentences correspond to paths in a randomly-labeled tree. We prove that random sub-tree languages satisfy conditions (1-2) with high probability, and are therefore translatable by our algorithm.Our theory is motivated by a recent initiative to translate whale communication using modern machine translation techniques. The recordings of whale communications that are being collected have no parallel human-language data. We are further motivated by recent empirical work, reported in the machine learning literature, demonstrating that unsupervised translation is possible in certain settings. |
Shafi Goldwasser · David Gruber · Adam Kalai · Orr Paradise 🔗 |
Sat 9:52 a.m. - 9:55 a.m.
|
Chunking Space and Time with Information Geometry
(
Spotlight
)
link »
SlidesLive Video » Humans are exposed to a continuous stream of sensory data, yet understand the world in terms of discrete concepts. A large body of work has focused on chunking sensory data in time, i.e. finding event boundaries, typically identified by model prediction errors. Similarly, chucking sensory data in space is the problem at hand when building spatial maps for navigation. In this work, we argue that a single mechanism underlies both, which is building a hierarchical generative model of perception and action, where chunks at a higher level are formed by segments surpassing a certain information distance at the level below. We demonstrate how this can work in the case of robot navigation, and discuss how this could relate to human cognition in general. |
Tim Verbelen · Daria de Tinguy · Pietro Mazzaglia · Ozan Catal · Adam Safron 🔗 |
Sat 9:55 a.m. - 9:58 a.m.
|
The more human-like the language model, the more surprisal is the best predictor of N400 amplitude
(
Spotlight
)
link »
SlidesLive Video » Under information-theoretic accounts of language comprehension, the effort required to process a word is correlated with surprisal, the negative log-probability of that word given its context. This can (equivalently) be considered to reflect cognitive effort in proportion to the amount of information conveyed by a given word (Frank et al., 2015), or the amount of effort required to update the our incremental predictions about upcoming words (Levy, 2008; Aurnhammer and Frank, 2019). In contrast, others (e.g. Brothers and Kuperberg, 2021) have argued that processing difficulty is proportional to the contextual probability of a word, thus positing a linear (rather than logarithmic) relationship between word probability and processing difficulty. We investigate which of these two accounts best explain the N400, a neural response that provides some of the best evidence for prediction in language comprehension (Kutas et al., 2011; Van Petten and Luka, 2012; Kuperberg et al., 2020). To do this, we expand upon previous work by comparing how well the probability and surprisal calculated by 43 transformer language models predict N400 amplitude. We thus investigate both which models’ predictions best predict the N400, and for each model, whether surprisal or probability is more closely correlated with N400 amplitude. We find that of the models tested, OPT-6.7B and GPT-J are reliably the best at predicting N400 amplitude, and that for these transformers, surprisal is the better predictor. In fact, we find that the more highly correlated the predictions of a language model are with N400 amplitude, the greater the extent to which surprisal is a better predictor than probability. Since language models that more closely mirror human statistical knowledge are more likely to be informative about the human predictive system, these results support the information-theoretic account of language comprehension. |
James Michaelov · Benjamin Bergen 🔗 |
Sat 10:00 a.m. - 11:30 a.m.
|
Lunch Break
|
🔗 |
Sat 11:30 a.m. - 12:00 p.m.
|
Information-Theoretic Methods in the Study of the Lexicon
(
Invited talk
)
SlidesLive Video » Since Shannon originally proposed his mathematical theory of communication in the middle of the 20th century, information theory has been an important way of viewing and investigating problems at the interfaces between linguistics, cognitive science, and computation, respectively. With the upsurge in applying machine learning approaches to linguistics questions, information-theoretic methods are becoming an ever more important tool in the linguist’s toolbox. This talk focuses on three concrete applications of information-theoretic techniques to the study of the lexicon. In the first part of the talk, I take a coding-theoretic view of the lexicon. Using a novel generative statistical model, I discuss how to estimate the compressibility of the lexicon under various linguistic constraints. In the second part of the talk, I will discuss a longstanding debate in semiotics: How arbitrary is the relationship between a word's form and its meaning? Using mutual information, I give the first holistic quantification of form--meaning arbitrariness, and, in a 106-language study, we do indeed find a statistically significant relationship between a word's form and its meaning in many languages. Finally, in the third part of the talk, I will focus on whether there exists a pressure for or against homophony in the lexicons of the world. On one hand, Piantadosi et al. (2012) argue that homophony enables the reuse of efficient word forms and is thus beneficial for languages. However, on the other hand, Trott and Bergen (2020) posit that good word forms are more often homophonous simply because they are more phonotactically probable. I will discuss a new information-theoretic quantification of a language’s homophony: the sample Rényi entropy. Then, I discuss how to use quantification to study homophony and argue that there is no evidence for a pressure either towards or against homophony, a much more nuanced result than either Piantadosi et al.’s or Trott and Bergen’s findings. |
Ryan Cotterell 🔗 |
Sat 12:00 p.m. - 12:08 p.m.
|
A (dis-)information theory of revealed and unrevealed preferences
(
Oral
)
link »
SlidesLive Video » In complex situations involving communication, agents might attempt to mask their intentions, essentially exploiting Shannon's theory of information as a theory of misinformation. Here, we introduce and analyze a simple multiagent reinforcement learning task where a buyer sends signals to a seller via its actions, and in which both agents are endowed with a recursive theory of mind. We show that this theory of mind, coupled with pure reward-maximization, gives rise to agents that selectively distort messages and become skeptical towards one another. Using information theory to analyze these interactions, we show how savvy buyers reduce mutual information between their preferences and actions, and how suspicious sellers learn to strategically reinterpret or discard buyers' signals. |
Nitay Alon · Lion Schulz · Peter Dayan · Jeffrey S Rosenschein 🔗 |
Sat 12:10 p.m. - 12:40 p.m.
|
Information-Constrained Coordination of Economic Behavior
(
Invited talk
)
SlidesLive Video » In the economics literature, rate-distortion theory (under the name “rational inattention”) has been popular as a model of choice that depends only imprecisely on the characteristics of the options available to an individual decision maker (Sims, 2003; Woodford, 2009; Matejka and McKay, 2015; Mackowiak et al., forthcoming). In this theory, the distribution of actions taken in a given objective situation is assumed to be optimal (in the sense of maximizing expected reward), subject to a constraint on the mutual information between the objective state and the action choice. However, the assumption that a mutual-information cost is the only limit on the precision of choice has unappealing implications: for example, that conditional action probabilities should vary discontinuously with the (continuous) objective state if the rewards associated with given actions are a discontinuous function of the state. In the case of strategic interaction between multiple information-constrained decision makers, this can result in a prediction that equilibrium behavior (in which each agent’s behavior is optimally adapted to the others’ patterns of behavior) should vary discontinuously with changes in the objective state, with the discontinuous responses of each agent being justified by the discontinuous responses of the others. In the kind of example discussed, the location of the discontinuity is indeterminate, so that the assumption of mutually well-adapted behavior fails to yield definite predictions (Yang, 2015); moreover, the predicted discontinuity of equilibrium behavior does not seem to be observed in experiments (Heinemann et al., 2004, 2009; Frydman and Nunnari, 2022). We propose an alternative model of imprecise choice, in which each decision maker is modeled using a generalization of the “β-variational autoencoder” of Alemi et al. (2018), which nests the “rationally inattentive” model of choice as a limiting case. In our more general model, there are two distinct “rate-distortion” trade-offs: one between the rate of information transmission and a cross-entropy measure of distortion (as in the β-VAE of Alemi et al.), and another between the rate and the measure of distortion given by the negative of expected reward (as in rational inattention models). The generalization provides a model of how an imprecise classification of decision situations can be learned from a finite training data set, rather than assuming optimization relative to a precisely correct prior distribution; and it predicts only gradual changes in action probabilities in response to changes in the objective state, in line with experimental data. |
Michael Woodford 🔗 |
Sat 12:40 p.m. - 1:30 p.m.
|
Poster Session + Coffee Break
(
Poster Session
)
|
🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
Hourglass Emergence
(
Invited talk
)
SlidesLive Video » I will discuss how error and subjectivity and the universal collective property of biological systems shape computation and micro-macro relationships in information processing systems. I will introduce three principles of collective computation: downward causation through coarse-graining, hourglass emergence, and a preliminary, information theoretic concept called channel switching that my collaborators and I are developing to formalize the transition from micro-macro causality to macro-macro causality. |
Jessica Flack 🔗 |
Sat 2:00 p.m. - 2:55 p.m.
|
Panel Discussion: The Bandwagon Revisited
(
Discussion Panel
)
SlidesLive Video » |
Michael Woodford · Noor Sajid · Chris Sims · Jessica Flack · Ryan Cotterell 🔗 |
Sat 2:55 p.m. - 3:00 p.m.
|
Closing Remarks
SlidesLive Video » |
Samuel J Gershman 🔗 |
-
|
Successive Refinement and Coarsening of the Information Bottleneck
(
Poster
)
link »
SlidesLive Video » We discuss two models for two central issues appearing inconjunction with cognitive processing. One is the ability toincorporate fresh information to already learnt models; the other oneis moving to a better understanding of what happens when information``trickles'' through many layers of a cognitive processing pipeline. We do so by investigating formal properties of the Information Bottleneck method: namely how it relates to successive refinement, and successive coarsening of information. |
Hippolyte Charvin · Daniel Polani · Nicola Catenacci Volpi 🔗 |
-
|
Shannon Information of Synaptic Weights Post Induction of Long-Term Potentiation (Learning) is Nearly Maximized
(
Poster
)
link »
SlidesLive Video » Exploring different aspects of synaptic plasticity processes in the hippocampus is crucial to understanding mechanisms of learning and memory, improving artificial intelligence algorithms, and neuromorphic computers. Synapses from the same axon onto the same dendrite have a common history of coactivation and have similar spine head volumes, suggesting that synapse function precisely modulates structure. We have applied Shannon information theory to obtain a new analysis of synaptic information storage capacity (SISC) using non-overlapping dimensions of dendritic spine head volumes as a measure of synaptic weights with distinct states. Spine head volumes in the stratum radiatum of hippocampal area CA1 occupied 24 distinct states (4.1 bits). In contrast, spine head volumes in the middle molecular layer of control dentate gyrus occupied only 5 distinct states (2 bits). Thus, synapses in different hippocampal regions had different synaptic information storage capacities. Moreover, these were not fixed properties but increased during long-term potentiation, such that by 30 min following induction, spine head volumes in the middle molecular layer increased to occupy 10 distinct states (3 bits), and this increase lasted for at least 2 hours. Measurement of the Kullback-Liebler divergence revealed that synaptic states evolved closer to storing the maximum amount of information during long-term potentiation. These results show that our new SISC analysis provides an improved and reliable estimate of information storage capacity of synapses. SISC revealed that the Shannon information after long-term potentiation is nearly maximized for the number of distinguishable states. |
Mohammad Samavat · Tom Bartol · Cailey Bromer · Jared Bowden · Dusten Hubbard · Dakota Hanka · Masaaki Kuwajima · John Mendenhall · Patrick Parker · Wickliffe Abraham · Kristen Harris · Terrence Sejnowski
|
-
|
Relaxing the Kolmogorov Structure Function for Realistic Computational Constraints
(
Poster
)
link »
The degree to which a task is learnable given different computational constraints shows the amount of usable information at different scales. An instantiation of this idea is the \textit{Kolmogorov Structure Function} (KSF), which shows how the fit of an optimal $k$-bit description of a given string improves for increasing values of $k$. While conceptually appealing, computing the KSF is infeasible in practice due to the exponentially large search space of all descriptions of a given length, in addition to the unbounded time complexity. This paper proposes the Constrained Structure Function (CSF), a generalization of the KSF that can be computed efficiently by taking into account realistic computational constraints. In addition to being feasible to compute, the CSF of a dataset can be expressed as the sum of datapoint-wise functions which reflect the degree to which each datapoint is typical in the context of the dataset. Empirically, we demonstrate that the CSF can be used for detecting individual datapoints with characteristics such as being easy, mislabeled, or belonging to a hidden subgroup.
|
Yoonho Lee · Chelsea Finn · Stefano Ermon 🔗 |
-
|
Using Shannon Information to Probe the Precision of Synaptic Strengths
(
Poster
)
link »
SlidesLive Video » Synapses between neurons control the the strengths of neuronal communication in neural circuits and their strengths are in turn dynamically regulated by experience. Because dendritic spine head volumes are highly correlated synaptic strength [1], anatomical reconstructions can probe the distributions of synaptic strengths. Synapses from the same axon onto the same dendrite (SDSA pairs) have a common history of coactivation and have nearly the same spine head volumes, suggesting that synapse function precisely modulates structure. We have applied Shannon information theory to obtain a new analysis of synaptic information storage capacity (SISC) using non-overlapping clusters of dendritic spine head volumes as a measure of synaptic strengths with distinct states based on the synaptic precision level calculated from 10 SDSA pairs. SISC analysis revealed spine head volumes in the stratum radiatum of hippocampal area CA1 occupied 24 distinct states (4.1 bits). This finding indicates an unexpected degree of precision that has implications for learning algorithms in artificial neural network models. |
Mohammad Samavat · Tom Bartol · Kristen Harris · Terrence Sejnowski 🔗 |
-
|
Learning the Feedback Connections from V1 to LGN via Information Maximization
(
Poster
)
link »
SlidesLive Video » The lateral geniculate nucleus (LGN) relay cells act as a gateway for transmitting visual information from retina to the primary visual cortex (V1). The activities of thalamic relay cells are modulated by feedback connections emanating from layer 6 of V1. While the receptive field (RF) properties of these early parts of the visual system are relatively well understood, the function, computational role, and details of the feedback network from V1 to LGN are not. Computational models of efficient coding have been successful in deriving RF properties of retinal ganglion and V1 simple cells by optimizing the Shannon information. Further, previous experimental results have suggested that the feedback increases the Shannon information. Motivated by this earlier work, we try to understand the function of the feedback as optimizing the feedforward information to cortex. We build a model that learns feedback weights by maximizing the feedforward Shannon information on naturalistic stimuli. Our model predicts the strength and sign of feedback from a V1 cell to all ON- and OFF-center LGN relay cells that are within or surrounding the V1 cell RF. We find a highly specific pattern of influence on ON and OFF-center LGN overlapping the V1 RF depending on whether they overlapped the ON or OFF zone of the V1 RF. In addition, we find general inhibitory feedback in the further surround, which sharpens the RFs and increases surround suppression in LGN relay cells. This is consistent with results of recent experiments exploring the impact of feedback on stimuli integration. |
Reza Eghbali · Fritz Sommer · Murray Sherman 🔗 |
-
|
Explicitly Nonlinear Connectivity-Matrix Independent Component Analysis in Resting fMRI Data
(
Poster
)
link »
SlidesLive Video » Connectivity-matrix independent component analysis (cmICA) is a data-driven method to calculate brain voxel maps of functional connectivity. It is a powerful approach, but one limitation is that it can only capture linear relationships. In this work, we focus on measuring the explicitly nonlinear relationships between the voxel connectivity to identify brain spatial map in which demonstrate explicitly nonlinear dependencies. We expand cmICA using normalized mutual information (NMI) after removing the linear relationships and find highly structured resting networks which would be completely missed by existing functional connectivity approaches. |
Sara Motlaghian 🔗 |
-
|
Challenges and Approaches to an Information-Theoretic Framework for the Analysis of Embodied Cognitive Systems
(
Poster
)
link »
Information theory has a long track record as a widely adopted framework to study complex systems. This remains true within the context of cognitive systems, where it has been utilized in psychology, neuroscience, artificial intelligence, artificial life etc. A crucial aspect to what makes a system cognitive is the fact that they are in continuous closed-loop interaction with their environments. This brings certain challenges to utilizing information theory in this context: these systems are multivariate, information has temporal dynamics, information could be distributed across the system and finally, cognition doesn't have to be limited to the brain. In this article, we provide perspectives on these challenges, explain their significance, provide examples of how they have been tackled in other work, and outline the open challenges. |
Madhavun Candadai · Eduardo Izquierdo 🔗 |
-
|
Directed Information for Point Process Systems
(
Poster
)
link »
SlidesLive Video » Owing to neurotechnological advances in electrode design, it is now possible to simultaneously record spiking activity from hundreds to thousands of neurons. Such extensive data provides an opportunity to study how groups of neurons coordinate to form functional ensembles that ultimately drive behavior. Since the spike train space is devoid of an algebraic structure, quantifying causal relations between the neuronal nodes poses a computational challenge. Here, we combine techniques from information theory and kernel-based spike train representations to construct an estimator of directed information for causal analysis between neural spike train data. Via projection of spiking data into a reproducing kernel Hilbert space, we avoid tedious evaluations of probability distribution while engaging computations in a non-linear space of (possibly) infinite dimensionality. Additionally, the estimator allows for conditioning on `side' variables to eliminate indirect causal influences in a multi-neuron network. Extensive analyses on a simulated six-neuron network model comprising of different neuron types and causal topologies show that the devised measure identifies directional influences accurately that would be otherwise inaccessible with traditional correlation measures. Finally, we apply the metric to identify direct causal interactions among neurons recorded from cortical columns of visual-area 4 of monkeys performing a delayed match to sample task. Our results reveal an interesting reorganization of neuronal interaction patterns within a cortical column on visual stimulation. |
Shailaja Akella · Andre Bastos · Jose C Principe 🔗 |
-
|
When to choose: The role of information seeking in the speed-accuracy tradeoff
(
Poster
)
link »
SlidesLive Video » Normative accounts of decision-making predict that people attempt to balance the immediate reward associated with a correct response against the cost of deliberation. However, humans frequently deliberate longer than normative models say they should. We propose that people try to optimize not only their rate of material rewards, but also their rate of information gain. A computational model that combines this idea with a standard drift diffusion process reveals that an agent programmed to maximize a combination of reward and information rates acts like human decision makers, reproducing key patterns of behavior not predicted by existing models. Moreover, if we assume that skill level is sensitive to deliberation time, a novice agent who maximizes even a small amount of information rate will often earn more reward in the long run than one who only maximizes reward rate. Maximizing a combination of reward and information rate is a relatively simple and myopic strategy, but approximates optimal behavior over learning, making it a candidate heuristic for this difficult intertemporal choice problem. |
Javier Alejandro Masís Obando · David Melnikoff · Lisa Feldman Barrett · Jonathan D Cohen 🔗 |
-
|
Behavioral Engagement and Manifold Representation in the Hippocampus: Evidence from the Mutual Information of Population Encoding and Location
(
Poster
)
link »
SlidesLive Video » Although there is significant understanding in how individual neurons in the hippocampus represent spatial location, the temporal dependence of population coding remains poorly understood. Using a novel statistical estimator and theoretical modeling, both developed in the framework of maximum entropy models, we reveal temporal changes in fidelity of the spatial map, consistent with observed gating due to behavioral engagements. |
Shagesh Sridharan · Anirvan Sengupta 🔗 |
-
|
On Narrative Information and the Distillation of Stories
(
Poster
)
link »
SlidesLive Video » The act of telling stories is a fundamental part of what it means to be human. This work introduces the concept of narrative information, which we define to be the overlap in information space between a story and the items that compose the story. Using contrastive learning methods, we show how modern artificial neural networks can be leveraged to distill stories and extract a representation of the narrative information. We then demonstrate how evolutionary algorithms can leverage this to extract a set of narrative templates and how these templates---in tandem with a new novel curve-fitting algorithm we introduce---can reorder music albums to automatically induce stories in them. In the process of doing so, we give strong statistical evidence that these templates under narrative information are present in existing albums. While we experiment only with music albums here, the premises of our work extend to any form of (largely) independent media. |
Dylan Ashley · Vincent Herrmann · Zachary Friggstad · Jürgen Schmidhuber 🔗 |
-
|
Compressed information is all you need: unifying intrinsic motivations and representation learning
(
Poster
)
link »
SlidesLive Video » Humans can recognize categories, shapes, colors, grasp/manipulate objects, run or take a plane. To reach this level of cognition, developmental psychology identifies two key elements: 1- children have a spontaneous drive to explore and learn open-ended skills, called intrinsic motivation; 2- perceiving and acting are deeply intertwined: a chair is a chair because I can sit on it. This supports the hypothesis that the development of perception and skills may be continually underpinned by one guiding principle. Here, we investigate the consequence of maximizing the multi-information of a simple cognitive architecture, modelled as a causal model. We show that it provides a coherent unifying view on numerous results in unsupervised learning of representations and intrinsic motivations. This poses our framework as a serious candidate to be a guiding unifying principle. |
Arthur Aubret · Mathieu Lefort · Céline Teulière · Laetitia Matignon · Salima Hassas · Jochen Triesch 🔗 |
-
|
Machine Learning Explainability from an Information-theoretic Perspective
(
Poster
)
link »
SlidesLive Video » The primary challenge for practitioners with multiple \textit{post-hoc gradient-based} interpretability methods is to benchmark them and select the best. Using information theory, we represent finding the optimal explainer as a rate-distortion optimization problem. Therefore : \begin{itemize} \item We propose an information-theoretic test \verb|InfoExplain| to resolve the benchmarking ambiguity in a model agnostic manner without additional user data (apart from the input features, model, and explanations). \item We show that \verb|InfoExplain| is extendable to utilise human interpretable concepts, deliver performance guarantees, and filter out erroneous explanations.\end{itemize}The adjoining experiments, code and data will be released soon. |
Debargha Ganguly · Debayan Gupta 🔗 |
-
|
How Predictive Minds Explain and Control Dynamical Systems
(
Poster
)
link »
SlidesLive Video » We study the relationship between prediction, explanation, and control in artificial |
Roman Tikhonov · Sarah Marzen · Simon DeDeo 🔗 |
-
|
A unified information-theoretic model of EEG signatures of human language processing
(
Poster
)
link »
SlidesLive Video » We advance an information-theoretic model of human language processing in the brain, in which incoming linguistic input is processed at two levels, in terms of a heuristic interpretation and in terms of error correction. We propose that these two kinds of information processing have distinct electroencephalographic signatures, corresponding to the well-documented N400 and P600 components of language-related event-related potentials (ERPs). Formally, we show that the information content (surprisal) of a word in context can be decomposed into two quantities: (A) heuristic surprise, which signals processing difficulty of word given its inferred context, and corresponds with the N400 signal; and (B) discrepancy signal, which reflects divergence between the true context and the inferred context, and corresponds to the P600 signal. Both of these quantities can be estimated using modern NLP techniques. We validate our theory by successfully simulating ERP patterns elicited by a variety of linguistic manipulations in previously-reported experimental data from Ryskin et al. (2021). Our theory is in principle compatible with traditional cognitive theories assuming a `good-enough' heuristic interpretation stage, but with precise information-theoretic formulation. |
Jiaxuan Li · Richard Futrell 🔗 |
-
|
Efficient coding explains neural response homeostasis and stimulus-specific adaptation
(
Poster
)
link »
SlidesLive Video » Empirical studies have demonstrated that across changes in their sensory environment or input statistics, cortical neurons display a homeostasis or equalisation of firing rates. We present a normative explanation of such firing rate homeostasis grounded in efficient coding theory and the infomax principle. We further demonstrate how homeostatic coding, coupled with Bayesian theories of neural representation can explain stimulus-specific adaptation effects, which is widely observed in the nervous system (e.g., in the visual cortex), and how it can be achieved by divisive normalisation with adaptive weights. |
Edward Young · Yashar Ahmadian 🔗 |
-
|
An information-theoretic perspective on intrinsic motivation in reinforcement learning
(
Poster
)
link »
SlidesLive Video » The standard reinforcement learning (RL) framework faces the problem of transfer learning and sparse rewards explorations. To address these problems, a large number of heterogeneous intrinsic motivation have been proposed, like reaching unpredictable states or unvisited states. Yet, it lacks a coherent view on these intrinsic motivations, making hard to understand their relations as well as their underlying assumptions. Here, we propose a new taxonomy of intrinsic motivations based on information theory: we computationally revisit the notions of surprise, novelty and skill learning and identify their main implementations through a short review of intrinsic motivations in RL. Our information theoretic analysis paves the way towards an unifying view over complex behaviors, thereby supporting the development of new objective functions. |
Arthur Aubret · Laetitia Matignon · Salima Hassas 🔗 |
-
|
The Information Bottleneck Principle in Corporate Hierarchies
(
Poster
)
link »
SlidesLive Video » The hierarchical nature of corporate information processing is a topic of great interest in economic and management literature. Firms are characterised by a need to make complex decisions, often aggregating partial and uncertain information, which greatly exceeds the attention capacity of constituent individuals. However, the efficient transmission of these signals is still not fully understood. Recently, the information bottleneck principle has emerged as a powerful tool for understanding the transmission of relevant information through intermediate levels in a hierarchical structure. In this paper we note that the information bottleneck principle may similarly be applied directly to corporate hierarchies. In doing so we provide a bridge between organisation theory and that of rapidly expanding work in deep neural networks (DNNs), including the use of skip connections as a means of more efficient transmission of information in hierarchical organisations. |
Cameron Gordon 🔗 |
-
|
Information-theoretic analysis of disfluencies in speech
(
Poster
)
link »
SlidesLive Video » This study proposes and examines an information-theoretic measure of planning in incremental speech production, and investigates the effects of planning, predictability, and interference-based measures on distractor selection and production in lexical substitution errors. We then present a rate-distortion theoretic model of speech production that explicates how these factors affect the production of lexical substitution errors. |
Shiva Upadhye · Richard Futrell 🔗 |
-
|
Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance
(
Poster
)
link »
SlidesLive Video » Quantifying which neurons are important with respect to the classification decision of a trained neural network is essential for understanding their inner workings. Previous work primarily attributed importance to individual neurons. In this work, we study which groups of neurons contain synergistic or redundant information using a multivariate mutual information method called the O-information. We observe the first layer is dominated by redundancy suggesting general shared features (i.e. detecting edges) while the last layer is dominated by synergy indicating local class-specific features (i.e. concepts). Finally, we show the O-information can be used for multi-neuron importance by re-training a synergistic sub-network results in a minimal change in performance. These results suggest our method can be used for pruning and unsupervised representation learning. |
Kenzo Clauw · Daniele Marinazzo · Sebastiano Stramaglia 🔗 |
-
|
Information Bottleneck for Multi-Task LSTMs
(
Poster
)
link »
Neural networks, which have had a profound effect on how researchers , do so through a complex, nonlinear mathematical structure which can be difficult to interpret or understand. This is especially true for recurrent models, as their dynamic structure can be difficult to measure and analyze. However, interpretability is a key factor in understanding certain problems such as text and language analysis. In this paper, we present a novel introspection method for LSTMs trained to solve complex language problems, such as sentiment analysis. Inspired by Information Bottleneck theory, our method uses a state-of-the-art information theoretic framework to visualize shared information around labels, features, and between layers. We apply our approach on simulated data, and real sentiment analysis datasets, providing novel, information-theoretic insights into internal model dynamics. |
Bradley Baker · Noah Lewis · Debbrata Kumar Saha · Md Abdur Rahaman · Sergey Plis · Vince Calhoun 🔗 |
-
|
Bayesian Oracle for bounding information gain in neural encoding models
(
Poster
)
link »
SlidesLive Video » Many normative theories that link neural population activity to cognitive tasks, such as neural sampling and the Bayesian brain hypothesis, make predictions for single trial fluctuations. Linking information theoretic principles of cognition to neural activity thus requires models that accurately capture all moments of the response distribution. However, to measure the quality of such models, commonly used correlation-based metrics are not sufficient as they mainly care about the mean of the response distribution. An interpretable alternative evaluation metric for likelihood-based models is Information Gain (IG) which evaluates the likelihood of a model relative to a lower and upper bound. However, while a lower bound is usually easy to obtain and evaluate, constructing an upper bound turns out to be challenging for neural recordings with relatively low numbers of repeated trials, high (shared) variability and sparse responses. In this work, we generalize the jack-knife oracle estimator for the mean -- commonly used for correlation metrics -- to a flexible Bayesian oracle estimator for IG based on posterior predictive distributions. We describe and address the challenges that arise when estimating the lower and upper bounds from small datasets. We then show that our upper bound estimate is data-efficient and robust even in the case of sparse responses and low signal-to-noise ratio. Finally, we provide the derivation of the upper bound estimator for a variety of common distributions including the state-of-the-art zero-inflated mixture models. |
Konstantin-Klemens Lurz · Mohammad Bashiri · Fabian Sinz 🔗 |
-
|
On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning
(
Poster
)
link »
Throughout the cognitive-science literature, there is widespread agreement that decision-making agents operating in the real world do so under limited information-processing capabilities and without access to unbounded cognitive or computational resources. Prior work has drawn inspiration from this fact and leveraged an information-theoretic model of such behaviors or policies as communication channels operating under a bounded rate constraint. Meanwhile, a parallel line of work also capitalizes on the same principles from rate-distortion theory to formalize capacity-limited decision making through the notion of a learning target, which facilitates Bayesian regret bounds for provably-efficient learning algorithms. In this paper, we aim to elucidate this latter perspective by presenting a brief survey of these information-theoretic models of capacity-limited decision making in biological and artificial agents. |
Dilip Arumugam · Mark Ho · Noah Goodman · Benjamin Van Roy 🔗 |
-
|
Similarity-preserving Neural Networks from GPLVM and Information Theory
(
Poster
)
link »
SlidesLive Video » This work proposes a way of deriving the structure of plausible canonical microcircuit models, replete with feedforward, lateral, and feedback connections, out of information-theoretic considerations. The resulting circuits show biologically plausible features, such as being trainable online and having local synaptic update rules reminiscent of the Hebbian principle. Our work achieves these goals by rephrasing Gaussian Process Latent Variable Models as a special case of the more recently developed similarity matching framework. One remarkable aspect of the resulting network is the role of lateral interactions in preventing overfitting. Overall, our study emphasizes the importance of recurrent connections in neural networks, both for cognitive tasks in the brain and applications to artificial intelligence. |
Yanis Bahroun · Atithi Acharya · Dmitri Chklovskii · Anirvan Sengupta 🔗 |
-
|
Learning in Factored Domains with Information-Constrained Visual Representations
(
Poster
)
link »
SlidesLive Video »
Humans learn quickly even in tasks that contain complex visual information. This is due in part to the efficient formation of compressed representations of visual information, allowing for better generalization and robustness. However, compressed representations alone are insufficient for explaining the high speed of human learning. Reinforcement learning (RL) models that seek to replicate this impressive efficiency may do so through the use of factored representations of tasks. These informationally simplistic representations of tasks are similarly motivated as the use of compressed representations of visual information. Recent studies have connected biological visual perception to disentangled and compressed representations. This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks. In this paper we present a model of human factored representation learning based on an altered form of a $\beta$-Variational Auto-encoder used in a visual learning task. Modelling results demonstrate a trade-off in the informational complexity of model latent dimension spaces, between the speed of learning and the accuracy of reconstructions.
|
Tyler Malloy · Chris Sims · Tim Klinger · Matthew Riemer · Miao Liu · Gerald Tesauro 🔗 |
-
|
There Are Fewer Facts Than Words: Communication With A Growing Complexity
(
Poster
)
link »
SlidesLive Video » We present an impossibility result, called a theorem about facts and words, which pertains to a general communication system. The theorem states that the number of distinct words detectable in a finite text cannot be less than the number of independent elementary persistent facts described in the same text. In particular, this theorem can be related to Zipf's law, power-law scaling of mutual information, and power-law-tailed learning curves. The assumptions of the theorem are: a finite alphabet, linear sequence of symbols, complexity that does not decrease in time, entropy rate that can be estimated, and finiteness of the inverse complexity rate. |
Lukasz Debowski 🔗 |
-
|
Generalizing with overly complex representations
(
Poster
)
link »
Representations enable cognitive systems to generalize from known experiences to the new ones. Simplicity of a representation has been linked to its generalization ability. Conventionally, simple representations are associated with a capacity to capture the structure in the data and rule out the noise. Representations with more flexibility than required to accommodate the structure of the target phenomenon, on the contrary, risk to catastrophically overfit the observed samples and fail to generalize to new observations. Here, I computationally test this idea by using a simple task of learning a representation to predict unseen features based on the observed ones. I simulate the process of learning a representation that has a lower, matching, or higher dimensionality than the world it intends to capture. The results suggest that the representations of the highest dimensionality consistently generate the best out-of-sample predictions despite perfectly memorizing the training observations. These findings are in line with the recently described ``double descent” of generalization error -- an observation that many learning systems generalize best when overparameterized (when their representational capacity far exceeds the task requirements). |
Marina Dubova 🔗 |
-
|
Generalization and Translatability in Emergent Communication via Informational Constraints
(
Poster
)
link »
Traditional emergent communication (EC) methods often fail to generalize to novel settings or align with representations of natural language. Here, we show how controlling the Information Bottleneck (IB) tradeoff between complexity and informativeness (a principle thought to guide human languages) helps to address both of these problems in EC. Using VQ-VIB, a recent method for training EC agents while controlling the IB tradeoff, we find that: (1) increasing pressure for informativeness, which encourages agents to develop a shared understanding beyond task-specific needs, leads to better generalization to more challenging tasks and novel inputs; (2) VQ-VIB agents develop an EC space that encodes some semantic similarities and facilitates open-domain communication, similar to word embeddings in natural language; and (3) when translating between English and EC, greater complexity leads to improved performance of teams of simulated English speakers and trained VQ-VIB listeners, but only up to a threshold corresponding to the English complexity. These results indicate the importance of informational constraints for improving self-play performance and human-agent interaction. |
Mycal Tucker · Roger Levy · Julie A Shah · Noga Zaslavsky 🔗 |
-
|
On the informativeness of supervision signals
(
Poster
)
link »
Learning transferable representations by training a classifier is a well-established technique in deep learning (e.g. ImageNet pretraining), but there is a lack of theory to explain why this kind of task-specific pre-training should result in 'good' representations. We conduct an information-theoretic analysis of several commonly-used supervision signals to determine how they contribute to representation learning performance and how the dynamics are affected by training parameters like the number of labels, classes, and dimensions in the training dataset. We confirm these results empirically in a series of simulations and conduct a cost-benefit analysis to establish a tradeoff curve allowing users to optimize the cost of supervising representation learning. |
Ilia Sucholutsky · Raja Marjieh · Tom Griffiths 🔗 |
-
|
Neural networks learn an environment's geometry in latent space by performing predictive coding on visual scenes
(
Poster
)
link »
Humans navigate complex environments using only visual cues and self-motion. Mapping an environment is an essential task for navigation within a physical space; neuroscientists and cognitive scientists also postulate that mapping algorithms underlie cognition by mapping concepts, memories, and other nonspatial variables. Despite the broad importance of mapping algorithms in neuroscience, it is not clear how neural networks can build spatial maps exclusively from sensor observations without access to the environment’s coordinates through reinforcement learning or supervised learning. Path integration, for example, implicitly needs the environment’s coordinates to predict how past velocities translate into the current position. Here we show that predicting sensory observations—called predictive coding—extends path integration from implicitly requiring the environment’s coordinates. Specifically, a neural network constructs an environmental map in its latent space by predicting visual input. As the network traverses complex environments in Minecraft, spatial proximity between object positions affects distances in the network's latent space. The relationship depends on the uniqueness of the environment’s visual scene as measured by the mutual information between the images and spatial position. Predictive coding extends to any sequential dataset. Observations from paths traversing a manifold can generate such sequential data. We anticipate neural networks that perform predictive coding identify the underlying manifold without requiring the manifold’s coordinates. |
James Gornet · Matt Thomson 🔗 |
-
|
Chunking Space and Time with Information Geometry
(
Poster
)
link »
Humans are exposed to a continuous stream of sensory data, yet understand the world in terms of discrete concepts. A large body of work has focused on chunking sensory data in time, i.e. finding event boundaries, typically identified by model prediction errors. Similarly, chucking sensory data in space is the problem at hand when building spatial maps for navigation. In this work, we argue that a single mechanism underlies both, which is building a hierarchical generative model of perception and action, where chunks at a higher level are formed by segments surpassing a certain information distance at the level below. We demonstrate how this can work in the case of robot navigation, and discuss how this could relate to human cognition in general. |
Tim Verbelen · Daria de Tinguy · Pietro Mazzaglia · Ozan Catal · Adam Safron 🔗 |
-
|
Information-theoretic Neural Decoding Reproduces Several Laws of Human Behavior
(
Poster
)
link »
Features of tasks and environments are often represented in the brain by neural firing rates. Representations must be decoded to enable downstream actions, and decoding takes time. We describe a toy model with a Poisson process encoder and an ideal observer Bayesian decoder, and show the decoding of rate-coded signals reproduces classic patterns of response time and accuracy observed in humans, including the Hick-Hyman Law, the Power Law of Learning, speed-accuracy trade-offs, and response times matching lognormal distributions. The decoder is equipped with a codebook, a prior distribution over signals, and an entropy stopping threshold. We argue that historical concerns of the applicability of such information-theoretic tools to neural and behavioral data arises from a confusion about the application of discrete-time coding techniques to continuous-time signals. |
S. Thomas Christie · Paul R Schrater 🔗 |
-
|
Compression supports low-dimensional representations of behavior across neural circuits
(
Poster
)
link »
Dimensionality reduction, a form of compression, can simplify representations of information to increase efficiency and reveal general patterns. Yet, this simplification also forfeits information, thereby reducing representational capacity. Hence, the brain may benefit from generating both compressed and uncompressed activity, and may do so in a heterogeneous manner across diverse neural circuits that represent low-level (sensory) or high-level (cognitive) stimuli. However, precisely how compression and representational capacity differ across the cortex remains unknown. Here we predict different levels of compression across regional circuits by using random walks on networks to model activity flow and to formulate rate-distortion functions, which are the basis of lossy compression. Using a large sample of youth ($n=1,040$), we test predictions in two ways: by measuring the dimensionality of spontaneous activity from sensorimotor to association cortex, and by assessing the representational capacity for 24 behaviors in neural circuits and 20 cognitive variables in recurrent neural networks. Our network theory of compression predicts the dimensionality of activity ($t=12.13, p<0.001$) and the representational capacity of biological ($r=0.53, p=0.016$) and artificial ($r=0.61, p<0.001$) networks. The model suggests how a basic form of compression is an emergent property of activity flow between distributed circuits that communicate with the rest of the network.
|
Dale Zhou · Jason Kim · Adam Pines · Valerie Sydnor · David Roalf · John Detre · Ruben Gur · Raquel Gur · Theodore Satterthwaite · Danielle S Bassett 🔗 |
-
|
The more human-like the language model, the more surprisal is the best predictor of N400 amplitude
(
Poster
)
link »
Under information-theoretic accounts of language comprehension, the effort required to process a word is correlated with surprisal, the negative log-probability of that word given its context. This can (equivalently) be considered to reflect cognitive effort in proportion to the amount of information conveyed by a given word (Frank et al., 2015), or the amount of effort required to update the our incremental predictions about upcoming words (Levy, 2008; Aurnhammer and Frank, 2019). In contrast, others (e.g. Brothers and Kuperberg, 2021) have argued that processing difficulty is proportional to the contextual probability of a word, thus positing a linear (rather than logarithmic) relationship between word probability and processing difficulty. We investigate which of these two accounts best explain the N400, a neural response that provides some of the best evidence for prediction in language comprehension (Kutas et al., 2011; Van Petten and Luka, 2012; Kuperberg et al., 2020). To do this, we expand upon previous work by comparing how well the probability and surprisal calculated by 43 transformer language models predict N400 amplitude. We thus investigate both which models’ predictions best predict the N400, and for each model, whether surprisal or probability is more closely correlated with N400 amplitude. We find that of the models tested, OPT-6.7B and GPT-J are reliably the best at predicting N400 amplitude, and that for these transformers, surprisal is the better predictor. In fact, we find that the more highly correlated the predictions of a language model are with N400 amplitude, the greater the extent to which surprisal is a better predictor than probability. Since language models that more closely mirror human statistical knowledge are more likely to be informative about the human predictive system, these results support the information-theoretic account of language comprehension. |
James Michaelov · Benjamin Bergen 🔗 |
-
|
A (dis-)information theory of revealed and unrevealed preferences
(
Poster
)
link »
In complex situations involving communication, agents might attempt to mask their intentions, essentially exploiting Shannon's theory of information as a theory of misinformation. Here, we introduce and analyze a simple multiagent reinforcement learning task where a buyer sends signals to a seller via its actions, and in which both agents are endowed with a recursive theory of mind. We show that this theory of mind, coupled with pure reward-maximization, gives rise to agents that selectively distort messages and become skeptical towards one another. Using information theory to analyze these interactions, we show how savvy buyers reduce mutual information between their preferences and actions, and how suspicious sellers learn to strategically reinterpret or discard buyers' signals. |
Nitay Alon · Lion Schulz · Peter Dayan · Jeffrey S Rosenschein 🔗 |