Workshop
Symmetry and Geometry in Neural Representations
Sophia Sanborn · Christian A Shewmake · Simone Azeglio · Nina Miolane
La Nouvelle Orleans Ballroom A+B (level 2)
Sat 16 Dec, 7 a.m. PST
In recent years, there has been a growing appreciation for the importance of respecting the topological, algebraic, or geometric structure of data in machine learning models. In parallel, an emerging set of findings in computational neuroscience suggests that the preservation of this kind of mathematical structure may be a fundamental principle of neural coding in biology. The goal of this workshop is to bring together researchers from applied mathematics and deep learning with neuroscientists whose work reveals the elegant implementation of mathematical structure in biological neural circuitry. Group theory and differential geometry were instrumental in unifying the models of 20thcentury physics. Likewise, they have the potential to unify our understanding of how neural systems form useful representations of the world.
Schedule
Sat 7:00 a.m.  7:30 a.m.

Prestructured lowdimensional manifolds for rapid and efficient learning, memory, and inference in the brain
(
Invited Talk
)
>
SlidesLive Video The brain constructs and combines modular structures for flexible computation. I will describe recent progress in characterizing the rigid and lowdimensional nature of some of these representations, using theoretical approaches including fully unsupervised topological characterization of neural population codes. I will then discuss models of how these rigid and modular circuits can emerge, and how they can generate, with high capacity and high dataefficiency without rewiring recurrent circuitry, cognitive maps across different variables (e.g. spatial and nonspatial) as well as across varied input dimensions. 
Ila Fiete 🔗 
Sat 7:30 a.m.  7:40 a.m.

Expressive dynamics models with nonlinear injective readouts enable reliable recovery of latent features from neural activity
(
Contributed Talk
)
>
SlidesLive Video An emerging framework in neuroscience uses the rules that govern how a neural circuit's state evolves over time to understand the circuit's underlying computation. While these \textit{neural dynamics} cannot be directly measured, new techniques attempt to estimate them by modeling observed neural recordings as a lowdimensional latent dynamical system embedded into a higherdimensional neural space. How these models represent the readout from latent space to neural space can affect the interpretability of the latent representation  for example, for models with a linear readout could make simple, lowdimensional dynamics unfolding on a nonlinear neural manifold appear excessively complex and highdimensional. Additionally, standard readouts (both linear and nonlinear) often lack injectivity, meaning that they don't obligate changes in latent state to directly affect activity in the neural space. During training, noninjective readouts incentivize the model to invent dynamics that misrepresent the underlying system and computation. To address the challenges presented by nonlinearity and noninjectivity, we combined a custom readout with a previously developed lowdimensional latent dynamics model to create the Ordinary Differential equations autoencoder with Injective Nonlinear readout (ODIN). We generated a synthetic spiking dataset by nonlinearly embedding activity from a lowdimensional dynamical system into higherD neural activity. We show that, in contrast to alternative models, ODIN is able to recover groundtruth latent activity from these data even when the nature of the system and embedding are unknown. Additionally, we show that ODIN enables the unsupervised recovery of underlying dynamical features (e.g., fixed points) and embedding geometry (e.g., the neural manifold) over alternative models. Overall, ODIN's ability to recover groundtruth latent features with low dimensionality make it a promising method for distilling interpretable dynamics that can explain neural computation. 
Christopher Versteeg 🔗 
Sat 7:40 a.m.  7:50 a.m.

On Complex Network Dynamics of an InVitro Neuronal System during Rest and Gameplay
(
Contributed Talk
)
>
SlidesLive Video In this study, we focus on characterising the complex network dynamics of in vitro neuronal system of live biological cells during two distinct activity states: spontaneous rest state and engagement in a realtime (closedloop) game environment. We use DishBrain which is a system that embodies in vitro neural networks with in silico computation using a highdensity multielectrode array. First, we embed the spiking activity of these channels in a lowerdimensional space using various representation learning methods. We then extract a subset of representative channels that are consistent across all of the neuronal preparations. Next, by analyzing these lowdimensional representations, we explore the patterns of macroscopic neuronal network dynamics during the learning process. Remarkably, our findings indicate that just using the lowdimensional embedding of representative channels is sufficient to differentiate the neuronal culture during the Rest and Gameplay conditions. Furthermore, we characterise the evolving neuronal connectivity patterns within the DishBrain system over time during Gameplay in comparison to the Rest condition. Notably, our investigation shows dynamic changes in the overall connectivity within the same region and across multiple regions on the multielectrode array only during Gameplay. These findings underscore the plasticity of these neuronal networks in response to external stimuli and highlight the potential for modulating connectivity in a controlled environment. The ability to distinguish between neuronal states using reduceddimensional representations points to the presence of underlying patterns that could be pivotal for realtime monitoring and manipulation of neuronal cultures. Additionally, this provides insight into how biological based information processing systems rapidly adapt and learn and may lead to new or improved algorithms. 
Moein Khajehnejad 🔗 
Sat 7:50 a.m.  8:00 a.m.

Geometry of abstract learned knowledge in deep RL agents
(
Contributed Talk
)
>
link
SlidesLive Video Data from neural recordings suggest that mammalian brains represent physical and abstract taskrelevant variables through lowdimensional neural manifolds. In a recent electrophysiological study (Nieh et al., 2021), mice performed an evidence accumulation task while moving along a virtual track. Nonlinear dimensionality reduction of the population activity revealed that taskrelevant variables were jointly mapped in an orderly manner in the lowdimensional space. Here we trained deep reinforcement learning (RL) agents on the same evidence accumulation task and found that their neural activity can be described with a lowdimensional manifold spanned by taskrelevant variables. These results provide further insight into similarities and differences between neural dynamics in mammals and deep RL agents. Furthermore, we showed that manifold learning can be used to characterize the representational space of the RL agents with the potential to improve the interpretability of decisionmaking in RL. 
James MochizukiFreeman 🔗 
Sat 8:00 a.m.  8:20 a.m.

Coffee Break
(
Coffee Break
)
>

🔗 
Sat 8:20 a.m.  8:50 a.m.

Topological Deep Learning: Going Beyond Graph Data
(
Invited Talk
)
>
SlidesLive Video 
Mustafa Hajij 🔗 
Sat 8:50 a.m.  9:00 a.m.

Spectral Maps for Learning on Subgraphs
(
Contributed Talk
)
>
SlidesLive Video In graph learning, maps between graphs and their subgraphs frequently arise. For instance, when coarsening or rewiring operations are present along the pipeline, one needs to keep track of the corresponding nodes between the original and modified graphs. Classically, these maps are represented as binary nodetonode correspondence matrices, and used asis to transfer nodewise features between the graphs. In this paper, we argue that simply changing this map representation can bring notable benefits to graph learning tasks. Drawing inspiration from recent progress in geometry processing, we introduce a spectral representation for maps that is easy to integrate into existing graph learning models. This spectral representation is a compact and straightforward plugin replacement, and is robust to topological changes of the graphs. Remarkably, the representation exhibits structural properties that make it interpretable, drawing an analogy with recent results on smooth manifolds. We demonstrate the benefits of incorporating spectral maps in graph learning pipelines, addressing scenarios where a nodetonode map is not well defined, or in the absence of exact isomorphism. Our approach bears practical benefits in knowledge distillation and hierarchical learning, where we show comparable or improved performance at a fraction of the computational cost. 
Marco Pegoraro 🔗 
Sat 9:00 a.m.  9:10 a.m.

Data Augmentations in Deep Weight Spaces
(
Contributed Talk
)
>
SlidesLive Video Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its unique, permutationequivariant, structure. Unfortunately, so far these architectures suffer from severe overfitting and were shown to benefit from large datasets. This poses a significant challenge because generating data for this learning setup is laborious and timeconsuming since each data sample is a full set of network weights that has to be trained. In this paper, we address this difficulty by investigating data augmentations for weight spaces, a set of techniques that enable generating new data examples on the fly without having to train additional input weight space elements. We first review several recently proposed data augmentation schemes and divide them into categories. We then introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate, which can be valuable for future studies. 
Aviv Shamsian 🔗 
Sat 9:10 a.m.  9:20 a.m.

From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication
(
Contributed Talk
)
>
SlidesLive Video It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating taskspecific transformations a priori can be challenging and expensive due to several factors (e.g., weights initialization, training hyperparameters, or data modality). To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zeroshot stitching setting. The experimental analysis comprises three modalities (vision, text, and graphs), twelve pretrained foundational models, eight benchmarks, and several architectures trained from scratch. 
Irene Cannistraci 🔗 
Sat 9:20 a.m.  9:30 a.m.

Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers
(
Contributed Talk
)
>
SlidesLive Video The Geometric Algebra Transformer (GATr) is a versatile architecture for geometric deep learning based on projective geometric algebra. We generalize this architecture into a blueprint that allows one to construct a scalable transformer architecture given any geometric (or Clifford) algebra. We study versions of this architecture for Euclidean, projective, and conformal algebras, all of which are suited to represent 3D data, and evaluate them in theory and practice. The simplest Euclidean architecture is computationally cheap, but has a smaller symmetry group and is not as sampleefficient, while the projective model is not sufficiently expressive. Both the conformal algebra and an improved version of the projective algebra define powerful, performant architectures. 
Pim de Haan 🔗 
Sat 9:30 a.m.  10:00 a.m.

The Role of World Models in Intelligence
(
Discussion Panel
)
>
SlidesLive Video 
🔗 
Sat 10:00 a.m.  11:20 a.m.

Lunch Break
(
Lunch Break
)
>

🔗 
Sat 11:20 a.m.  11:50 a.m.

From Local Diffeomorphism Detection to Symbolic Representation
(
Invited Talk
)
>
SlidesLive Video 
Doris Tsao 🔗 
Sat 11:50 a.m.  12:20 p.m.

Rotationequivariant predictive modeling reveals the functional organization of primary visual cortex
(
Invited Talk
)
>
SlidesLive Video More than a dozen excitatory cell types have been identified in the mouse primary visual cortex (V1) based on transcriptomic, morphological and in vitro electrophysiological features. However, little is known about the functional organization of visual cortex neurons and their responses properties beyond orientation selectivity. Here, we combined largescale twophoton imaging and predictive modeling of neural responses to study the functional organization of mouse V1. We developed a rotationequivariant model architecture, followed by a rotationinvariant clustering pipeline to map the landscape of neural function in V1. Clustering neurons based on their stimulus response function revealed a continuum with around 30 modes. Each mode represented a group of neurons that exhibited a specific combination of stimulus selectivity and nonlinear response properties such as crossorientation inhibition, sizecontrast tuning and surround suppression. Interestingly, these nonlinear properties were expressed independently and all possible combinations were present in the population. Our study shows how building known symmetries into neural response models can reveal interesting insights about the organization of the visual system. 
Alexander Ecker 🔗 
Sat 12:20 p.m.  12:30 p.m.

Internal Representations of Vision Models Through the Lens of Frames on Data Manifolds
(
Contributed Talk
)
>
SlidesLive Video While the last five years have seen considerable progress in understanding the internal representations of deep learning models, many questions remain. This is especially true when trying to understand the impact of model design choices, such as model architecture or training algorithm, on hidden representation geometry and dynamics. In this work we present a new approach to studying such representations inspired by the idea of a frame on the tangent bundle of a manifold. Our construction, which we call a neural frame, is formed by assembling a set of vectors representing specific types of perturbations of a data point, for example infinitesimal augmentations, noise perturbations, or perturbations produced by a generative model, and studying how these change as they pass through a network. Using neural frames, we make observations about the way that models process, layerbylayer, specific modes of variation within a small neighborhood of a datapoint. Our results provide new perspectives on a number of phenomena, such as the manner in which training with augmentation produces model invariance or the proposed tradeoff between adversarial training and model generalization. 
Henry Kvinge 🔗 
Sat 12:30 p.m.  1:00 p.m.

Physics Priors in Machine Learning
(
Invited Talk
)
>
SlidesLive Video Good neural architectures are rooted in good inductive biases (a.k.a. priors). Equivariance under symmetries is a prime example of a successful physics inspired prior which sometimes dramatically reduces the number of examples needed to learn predictive models. Diffusion based models, one of the most successful generative models, are rooted in nonequilibrium statistical mechanics. Conversely, ML methods have recently been used to solve PDEs for example in weather prediction, and to accelerate MD simulations by learning the (quantum mechanical) interactions between atoms and electrons. In this work we will try to extend this thinking to more flexible priors in the hidden variables of a neural network. In particular, we will impose wavelike dynamics in hidden variables under transformations of the inputs, which relaxes the stricter notion of equivariance. We find that under certain conditions, wavelike dynamics naturally arises in these hidden representations. We formalize this idea in a VAEovertime architecture where the hidden dynamics is described by a FokkerPlanck (a.k.a. driftdiffusion) equation. This in turn leads to a new definition of a disentangled hidden representation of input states that can easily be manipulated to undergo transformations. 
Max Welling 🔗 
Sat 1:00 p.m.  1:30 p.m.

Coffee Break
(
Break
)
>

🔗 
Sat 1:30 p.m.  1:40 p.m.

Symmetry Breaking and Equivariant Neural Networks
(
Contributed Talk
)
>
SlidesLive Video Using symmetry as an inductive bias in deep learning has been proven to be a principled approach for sampleefficient model design. However, the relationship between symmetry and the imperative for equivariance in neural networks is not always obvious. Here, we analyze a key limitation that arises in equivariant functions: their incapacity to break symmetry at the level of individual data samples. In response, we introduce a novel notion of 'relaxed equivariance' that circumvents this limitation. We further demonstrate how to incorporate this relaxation into equivariant multilayer perceptrons (EMLPs), offering an alternative to the noiseinjection method. The relevance of symmetry breaking is then discussed in various application domains: physics, graph representation learning, combinatorial optimization and equivariant decoding. 
Oumar Kaba 🔗 
Sat 1:40 p.m.  1:50 p.m.

Joint Group Invariant Functions on DataParameter Domain Induce Universal Neural Networks
(
Contributed Talk
)
>
SlidesLive Video The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. In this study, we present a systematic method to induce a generalized neural network and its right inverse operator, called the ridgelet transform, from a joint group invariant function on the dataparameter domain. Since the ridgelet transform is an inverse, (1) it can describe the arrangement of parameters for the network to represent a target function, which is understood as the encoding rule, and (2) it implies the universality of the network. Based on the group representation theory, we present a new simple proof of the universality by using Schur's lemma in a unified manner covering a wide class of networks, for example, the original ridgelet transform, formal deep networks, and the dual voice transform. Since traditional universality theorems were demonstrated based on functional analysis, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis. 
Sho Sonoda 🔗 
Sat 1:50 p.m.  2:00 p.m.

Towards Information TheoryBased Discovery of Equivariances
(
Contributed Talk
)
>
SlidesLive Video The presence of symmetries imposes a stringent set of constraints on a system. This constrained structure allows intelligent agents interacting with such a system to drastically improve the efficiency of learning and generalization, through the internalisation of the system's symmetries into their informationprocessing. In parallel, principled models of complexityconstrained learning and behaviour make increasing use of informationtheoretic methods. Here, we wish to marry these two perspectives and understand whether and in which form the informationtheoretic lens can ``see'' the effect of symmetries of a system. For this purpose, we propose a novel variant of the Information Bottleneck principle, which has served as a productive basis for many principled studies of learning and informationconstrained adaptive behaviour. We show (in the discrete case) that our approach formalises a certain duality between symmetry and information parsimony: namely, channel equivariances can be characterised by the optimal mutual informationpreserving joint compression of the channel's input and output. This informationtheoretic treatment furthermore suggests a principled notion of "soft" equivariance, whose "coarseness" is measured by the amount of inputoutput mutual information preserved by the corresponding optimal compression. This new notion offers a bridge between the field of bounded rationality and the study of symmetries in neural representations. The framework may also allow (exact and soft) equivariances to be automatically discovered. 
Hippolyte Charvin 🔗 
Sat 2:00 p.m.  2:05 p.m.

Announcements & Closing Remarks
(
Closing remarks
)
>
SlidesLive Video 
🔗 
Sat 2:05 p.m.  3:00 p.m.

Poster Session
(
Poster Session
)
>

🔗 


Joint Group Invariant Functions on DataParameter Domain Induce Universal Neural Networks
(
Oral
)
>
link
The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. In this study, we present a systematic method to induce a generalized neural network and its right inverse operator, called the ridgelet transform, from a joint group invariant function on the dataparameter domain. Since the ridgelet transform is an inverse, (1) it can describe the arrangement of parameters for the network to represent a target function, which is understood as the encoding rule, and (2) it implies the universality of the network. Based on the group representation theory, we present a new simple proof of the universality by using Schur's lemma in a unified manner covering a wide class of networks, for example, the original ridgelet transform, formal deep networks, and the dual voice transform. Since traditional universality theorems were demonstrated based on functional analysis, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis. 
Sho Sonoda · Hideyuki Ishi · Isao Ishikawa · Masahiro Ikeda 🔗 


Towards Information TheoryBased Discovery of Equivariances
(
Oral
)
>
link
The presence of symmetries imposes a stringent set of constraints on a system. This constrained structure allows intelligent agents interacting with such a system to drastically improve the efficiency of learning and generalization, through the internalisation of the system's symmetries into their informationprocessing. In parallel, principled models of complexityconstrained learning and behaviour make increasing use of informationtheoretic methods. Here, we wish to marry these two perspectives and understand whether and in which form the informationtheoretic lens can ``see'' the effect of symmetries of a system. For this purpose, we propose a novel variant of the Information Bottleneck principle, which has served as a productive basis for many principled studies of learning and informationconstrained adaptive behaviour. We show (in the discrete case) that our approach formalises a certain duality between symmetry and information parsimony: namely, channel equivariances can be characterised by the optimal mutual informationpreserving joint compression of the channel's input and output. This informationtheoretic treatment furthermore suggests a principled notion of "soft" equivariance, whose "coarseness" is measured by the amount of inputoutput mutual information preserved by the corresponding optimal compression. This new notion offers a bridge between the field of bounded rationality and the study of symmetries in neural representations. The framework may also allow (exact and soft) equivariances to be automatically discovered. 
Hippolyte Charvin · Nicola Catenacci Volpi · Daniel Polani 🔗 


Expressive dynamics models with nonlinear injective readouts enable reliable recovery of latent features from neural activity
(
Oral
)
>
link
An emerging framework in neuroscience uses the rules that govern how a neural circuit's state evolves over time to understand the circuit's underlying computation. While these \textit{neural dynamics} cannot be directly measured, new techniques attempt to estimate them by modeling observed neural recordings as a lowdimensional latent dynamical system embedded into a higherdimensional neural space. How these models represent the readout from latent space to neural space can affect the interpretability of the latent representation  for example, for models with a linear readout could make simple, lowdimensional dynamics unfolding on a nonlinear neural manifold appear excessively complex and highdimensional. Additionally, standard readouts (both linear and nonlinear) often lack injectivity, meaning that they don't obligate changes in latent state to directly affect activity in the neural space. During training, noninjective readouts incentivize the model to invent dynamics that misrepresent the underlying system and computation. To address the challenges presented by nonlinearity and noninjectivity, we combined a custom readout with a previously developed lowdimensional latent dynamics model to create the Ordinary Differential equations autoencoder with Injective Nonlinear readout (ODIN). We generated a synthetic spiking dataset by nonlinearly embedding activity from a lowdimensional dynamical system into higherD neural activity. We show that, in contrast to alternative models, ODIN is able to recover groundtruth latent activity from these data even when the nature of the system and embedding are unknown. Additionally, we show that ODIN enables the unsupervised recovery of underlying dynamical features (e.g., fixed points) and embedding geometry (e.g., the neural manifold) over alternative models. Overall, ODIN's ability to recover groundtruth latent features with low dimensionality make it a promising method for distilling interpretable dynamics that can explain neural computation. 
Christopher Versteeg · Andrew Sedler · Jonathan McCart · Chethan Pandarinath 🔗 


Data Augmentations in Deep Weight Spaces
(
Oral
)
>
link
Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its unique, permutationequivariant, structure. Unfortunately, so far these architectures suffer from severe overfitting and were shown to benefit from large datasets. This poses a significant challenge because generating data for this learning setup is laborious and timeconsuming since each data sample is a full set of network weights that has to be trained. In this paper, we address this difficulty by investigating data augmentations for weight spaces, a set of techniques that enable generating new data examples on the fly without having to train additional input weight space elements. We first review several recently proposed data augmentation schemes and divide them into categories. We then introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate, which can be valuable for future studies. 
Aviv Shamsian · David Zhang · Aviv Navon · Yan Zhang · Miltiadis (Miltos) Kofinas · Idan Achituve · Riccardo Valperga · Gertjan Burghouts · Efstratios Gavves · Cees Snoek · Ethan Fetaya · Gal Chechik · Haggai Maron



Internal Representations of Vision Models Through the Lens of Frames on Data Manifolds
(
Oral
)
>
link
While the last five years have seen considerable progress in understanding the internal representations of deep learning models, many questions remain. This is especially true when trying to understand the impact of model design choices, such as model architecture or training algorithm, on hidden representation geometry and dynamics. In this work we present a new approach to studying such representations inspired by the idea of a frame on the tangent bundle of a manifold. Our construction, which we call a neural frame, is formed by assembling a set of vectors representing specific types of perturbations of a data point, for example infinitesimal augmentations, noise perturbations, or perturbations produced by a generative model, and studying how these change as they pass through a network. Using neural frames, we make observations about the way that models process, layerbylayer, specific modes of variation within a small neighborhood of a datapoint. Our results provide new perspectives on a number of phenomena, such as the manner in which training with augmentation produces model invariance or the proposed tradeoff between adversarial training and model generalization. 
Henry Kvinge · Grayson Jorgenson · Davis Brown · Charles Godfrey · Tegan Emerson 🔗 


Spectral Maps for Learning on Subgraphs
(
Oral
)
>
link
In graph learning, maps between graphs and their subgraphs frequently arise. For instance, when coarsening or rewiring operations are present along the pipeline, one needs to keep track of the corresponding nodes between the original and modified graphs. Classically, these maps are represented as binary nodetonode correspondence matrices, and used asis to transfer nodewise features between the graphs. In this paper, we argue that simply changing this map representation can bring notable benefits to graph learning tasks. Drawing inspiration from recent progress in geometry processing, we introduce a spectral representation for maps that is easy to integrate into existing graph learning models. This spectral representation is a compact and straightforward plugin replacement, and is robust to topological changes of the graphs. Remarkably, the representation exhibits structural properties that make it interpretable, drawing an analogy with recent results on smooth manifolds. We demonstrate the benefits of incorporating spectral maps in graph learning pipelines, addressing scenarios where a nodetonode map is not well defined, or in the absence of exact isomorphism. Our approach bears practical benefits in knowledge distillation and hierarchical learning, where we show comparable or improved performance at a fraction of the computational cost. 
Marco Pegoraro · Riccardo Marin · Arianna Rampini · Simone Melzi · Luca Cosmo · Emanuele Rodolà 🔗 


Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers
(
Oral
)
>
link
The Geometric Algebra Transformer (GATr) is a versatile architecture for geometric deep learning based on projective geometric algebra. We generalize this architecture into a blueprint that allows one to construct a scalable transformer architecture given any geometric (or Clifford) algebra. We study versions of this architecture for Euclidean, projective, and conformal algebras, all of which are suited to represent 3D data, and evaluate them in theory and practice. The simplest Euclidean architecture is computationally cheap, but has a smaller symmetry group and is not as sampleefficient, while the projective model is not sufficiently expressive. Both the conformal algebra and an improved version of the projective algebra define powerful, performant architectures. 
Pim de Haan · Taco Cohen · Johann Brehmer 🔗 


On Complex Network Dynamics of an InVitro Neuronal System during Rest and Gameplay
(
Oral
)
>
link
In this study, we focus on characterising the complex network dynamics of in vitro neuronal system of live biological cells during two distinct activity states: spontaneous rest state and engagement in a realtime (closedloop) game environment. We use DishBrain which is a system that embodies in vitro neural networks with in silico computation using a highdensity multielectrode array. First, we embed the spiking activity of these channels in a lowerdimensional space using various representation learning methods. We then extract a subset of representative channels that are consistent across all of the neuronal preparations. Next, by analyzing these lowdimensional representations, we explore the patterns of macroscopic neuronal network dynamics during the learning process. Remarkably, our findings indicate that just using the lowdimensional embedding of representative channels is sufficient to differentiate the neuronal culture during the Rest and Gameplay conditions. Furthermore, we characterise the evolving neuronal connectivity patterns within the DishBrain system over time during Gameplay in comparison to the Rest condition. Notably, our investigation shows dynamic changes in the overall connectivity within the same region and across multiple regions on the multielectrode array only during Gameplay. These findings underscore the plasticity of these neuronal networks in response to external stimuli and highlight the potential for modulating connectivity in a controlled environment. The ability to distinguish between neuronal states using reduceddimensional representations points to the presence of underlying patterns that could be pivotal for realtime monitoring and manipulation of neuronal cultures. Additionally, this provides insight into how biological based information processing systems rapidly adapt and learn and may lead to new or improved algorithms. 
Moein Khajehnejad · Forough Habibollahi · Alon Loeffler · Brett J. Kagan · Adeel Razi 🔗 


Symmetry Breaking and Equivariant Neural Networks
(
Oral
)
>
link
Using symmetry as an inductive bias in deep learning has been proven to be a principled approach for sampleefficient model design. However, the relationship between symmetry and the imperative for equivariance in neural networks is not always obvious. Here, we analyze a key limitation that arises in equivariant functions: their incapacity to break symmetry at the level of individual data samples. In response, we introduce a novel notion of 'relaxed equivariance' that circumvents this limitation. We further demonstrate how to incorporate this relaxation into equivariant multilayer perceptrons (EMLPs), offering an alternative to the noiseinjection method. The relevance of symmetry breaking is then discussed in various application domains: physics, graph representation learning, combinatorial optimization and equivariant decoding. 
Oumar Kaba · Siamak Ravanbakhsh 🔗 


From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication
(
Oral
)
>
link
It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating taskspecific transformations a priori can be challenging and expensive due to several factors (e.g., weights initialization, training hyperparameters, or data modality). To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zeroshot stitching setting. The experimental analysis comprises three modalities (vision, text, and graphs), twelve pretrained foundational models, eight benchmarks, and several architectures trained from scratch. 
Irene Cannistraci · Luca Moschella · Marco Fumero · Valentino Maiorca · Emanuele Rodolà 🔗 


Learning Useful Representations of Recurrent Neural Network Weight Matrices
(
Poster
)
>
link
Recurrent Neural Networks (RNNs) are generalpurpose parallelsequential computers. The program of an RNN is its weight matrix. Its direct analysis, however, tends to be challenging. Is it possible to learn useful representations of RNN weights that facilitate downstream tasks? While the "Mechanistic Approach" directly 'looks inside' the RNN to predict its behavior, the "Functionalist Approach" analyzes its overall functionalityspecifically, its inputoutput mapping. Our two novel Functionalist Approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs. Our novel theoretical framework for the Functionalist Approach demonstrates conditions under which it can generate rich representations for determining the behavior of RNNs. RNN weight representations generated by Mechanistic and Functionalist approaches are compared by evaluating them in two downstream tasks. Our results show the superiority of Functionalist methods. 
Vincent Herrmann · Francesco Faccio · Jürgen Schmidhuber 🔗 


Distance Learner: Incorporating Manifold Prior to Model Training
(
Poster
)
>
link
The manifold hypothesis (realworld data concentrates near lowdimensional manifolds) is suggested as the principle behind the effectiveness of machine learning algorithms in very highdimensional problems that are common in domains such as vision and speech. Multiple methods have been proposed to explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural Networks (DNNs), with varying success. In this paper, we propose a new method, Distance Learner, to incorporate this prior for DNNbased classifiers. Distance Learner is trained to predict the distance of a point from the underlying manifold of each class, rather than the class label. For classification, Distance Learner then chooses the class corresponding to the closest predicted class manifold. Distance Learner can also identify points as being out of distribution (belonging to neither class), if the distance to the closest manifold is higher than a threshold. We evaluate our method on multiple synthetic datasets and show that Distance Learner learns much more meaningful classification boundaries compared to a standard classifier. We also evaluate our method on the task of adversarial robustness and find that it not only outperforms standard classifiers by a large margin but also performs at par with classifiers trained via wellaccepted standard adversarial training. 
Aditya Chetan · Nipun Kwatra 🔗 


An InformationTheoretic Understanding of Maximum Manifold Capacity Representations
(
Poster
)
>
link
Maximum Manifold Capacity Representations (MMCR) is a recent multiview selfsupervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is interesting for at least two reasons. Firstly, MMCR is an oddity in the zoo of MVSSL methods: it is not (explicitly) contrastive, applies no masking, performs no clustering, leverages no distillation, and does not (explicitly) reduce redundancy. Secondly, while many selfsupervised learning (SSL) methods originate in information theory, MMCR distinguishes itself by claiming a different origin: a statistical mechanical characterization of the geometry of linear separability of data manifolds. However, given the rich connections between statistical mechanics and information theory, and given recent work showing how many SSL methods can be understood from an informationtheoretic perspective, we conjecture that MMCR can be similarly understood from an informationtheoretic perspective. In this paper, we leverage tools from high dimensional probability and information theory to demonstrate that an optimal solution to MMCR's nuclear normbased objective function is the same optimal solution that maximizes a wellknown lower bound on mutual information. 
Victor Lecomte · Rylan Schaeffer · Berivan Isik · Mikail Khona · Yann LeCun · Sanmi Koyejo · Andrey Gromov · Ravid ShwartzZiv 🔗 


Sample Efficient Modeling of Drag Coefficients for Satellites with Symmetry
(
Poster
)
>
link
Accurate knowledge of the atmospheric drag coefficient for a satellite in low Earth orbit is crucial to plan an orbit that avoids collisions with other spacecraft, but its calculation has high uncertainty and is very expensive to numerically compute for longhorizon predictions. Previous work has improved coefficient modeling speed with datadriven approaches, but these models do not utilize domain symmetry. This work investigates enforcing the invariance of atmospheric particle deflections off certain satellite geometries, resulting in higher sample efficiency and theoretically more robustness for datadriven methods. We train $G$equivariant MLPs to predict the drag coefficient, where $G$ defines invariances of the coefficient across different orientations of the satellite. We experiment on a synthetic dataset computed using the numerical Test Particle Monte Carlo (TPMC) method, where particles are fired at a satellite in the computational domain. We find that our method is more sample and computationally efficient than unconstrained baselines, which is significant because TPMC simulations are extremely computationally expensive.

Neel Sortur · Linfeng Zhao · Robin Walters 🔗 


AMES: A Differentiable Embedding Space Selection Framework for Latent Graph Inference
(
Poster
)
>
link
In realworld scenarios, although data entities may possess inherent relationships, the specific graph illustrating their connections might not be directly accessible. Latent graph inference addresses this issue by enabling Graph Neural Networks (GNNs) to operate on point cloud data, dynamically learning the necessary graph structure. These graphs are often derived from a latent embedding space, which can be modeled using Euclidean, hyperbolic, spherical, or product spaces. However, currently, there is no principled differentiable method for determining the optimal embedding space. In this work, we introduce the Attentional MultiEmbedding Selection (AMES) framework, a differentiable method for selecting the best embedding space for latent graph inference through backpropagation, considering a downstream task. Our framework consistently achieves comparable or superior results compared to previous methods for latent graph inference across five benchmark datasets. Importantly, our approach eliminates the need for conducting multiple experiments to identify the optimal embedding space. Furthermore, we explore interpretability techniques that track the gradient contributions of different latent graphs, shedding light on how our attentionbased, fully differentiable approach learns to choose the appropriate latent space. In line with previous works, our experiments emphasize the advantages of hyperbolic spaces in enhancing performance. More importantly, our interpretability framework provides a general approach for quantitatively comparing embedding spaces across different tasks based on their contributions, a dimension that has been overlooked in previous literature on latent graph inference. 
Yuan Lu · Haitz Sáez de Ocáriz Borde · Pietro Lió 🔗 


Optimal packing of attractor states in neural representations
(
Poster
)
>
link
Animals' internal states reflect variables like their position in space, orientation, decisions, and motor actions—but how should these internal states be arranged? Internal states which frequently transition between one another should be close enough that transitions can happen quickly, but not so close that neural noise significantly impacts the stability of those states, and how reliably they can be encoded and decoded. In this paper, we study the problem of striking a balance between these two concerns, which we call an 'optimal packing' problem since it resembles mathematical problems like sphere packing. While this problem is generally extremely difficult, we show that symmetries in environmental transition statistics imply certain symmetries of the optimal neural representations, which allows us in some cases to exactly solve for the optimal state arrangement. We focus on two toy cases: uniform transition statistics, and cyclic transition statistics. 
John Vastola 🔗 


Grokking in recurrent networks with attractive and oscillatory dynamics
(
Poster
)
>
link
Generalization is perhaps the most salient property of biological intelligence. In the context of artificial neural networks (ANNs), generalization has been studied through investigating the recentlydiscovered phenomenon of "grokking" whereby small transformers generalize on modular arithmetic tasks. We extend this line of work to continuous time recurrent neural networks (CTRNNs) to investigate generalization in neural systems. Inspired by the card game SET, we reformulated previous modular arithmetic tasks as a binary classification task to elicit interpretable CTRNN dynamics. We found that CTRNNs learned one of two dynamical mechanisms characterized by either attractive or oscillatory dynamics. Notably, both of these mechanisms displayed strong parallels to deterministic finite automata (DFA). In our grokking experiments, we found that attractive dynamics generalize more frequently in training regimes with few withheld data points while oscillatory dynamics generalize more frequently in training regimes with many withheld data points. 
Keith Murray 🔗 


Quantifying Lie Group Learning with Local Symmetry Error
(
Poster
)
>
link
Despite increasing interest in using machine learning to discover symmetries, no quantitative measure has been proposed in order to compare the performance of different algorithms. Our proposal, both intuitively and theoretically grounded, is to compare Lie groups using a local symmetry error, based on the difference between their infinitesimal actions at any possible datapoint. Namely, we use a wellstudied metric to compare the induced tangent spaces. We obtain an upper bound on this metric which is uniform across datapoints, under some conditions. We show that when one of the groups is a circle group, this bound is furthermore both tight and easily computable, thus globally characterizing the local errors. We demonstrate our proposal by quantitatively evaluating an existing algorithm. We note that our proposed metric has deficiencies in comparing tangent spaces of different dimensions, as well as distinct groups whose local actions are similar. 
Vasco Portilheiro 🔗 


How do language models bind entities in context?
(
Poster
)
>
link
To correctly use incontext information, language models (LMs) must bind entities to their attributes. For example, given a context describing a "green square" and a "blue circle", LMs must bind the shapes to their respective colors. We analyze LM representations and identify the binding ID mechanism: a general mechanism for solving the binding problem, which we observe in every sufficiently large model from the Pythia and LLaMA families. Using causal interventions, we show that LMs' internal activations represent binding information by attaching binding ID vectors to corresponding entities and attributes. We further show that binding ID vectors form a continuous subspace, in which distances between binding ID vectors reflect their discernability. Overall, our results uncover interpretable strategies in LMs for representing symbolic knowledge incontext, providing a step towards understanding general incontext reasoning in largescale LMs. 
Jiahai Feng · Jacob Steinhardt 🔗 


Improving Convergence and Generalization Using Parameter Symmetries
(
Poster
)
>
link
In overparametrized models, different parameter values may result in the same loss. Parameter space symmetries are lossinvariant transformations that change the model parameters. Teleportation applies such transformations to accelerate optimization. However, the exact mechanism behind this algorithm's success is not well understood. In this paper, we prove that teleportation gives overall faster time to convergence. Additionally, teleporting to minima with different curvatures improves generalization, which suggests a connection between the curvature of the minima and generalization ability. Finally, we show that integrating teleportation into optimizationbased metalearning improves convergence over traditional algorithms that perform only local updates. Our results showcase the versatility of teleportation and demonstrate the potential of incorporating symmetry in optimization. 
Bo Zhao · Robert Gower · Robin Walters · Rose Yu 🔗 


Haldane Bundles: A Dataset for Learning to Predict the Chern Number of Line Bundles on the Torus
(
Poster
)
>
link
Characteristic classes, which are abstract topological invariants associated with vector bundles, have become an important notion in modern physics with surprising realworld consequences. As a representative example, the incredible properties of topological insulators, which are insulators in their bulk but conductors on their surface, can be completely characterized by a specific characteristic class associated with their electronic band structure, the first Chern class. Given their importance to next generation computing and the computational challenge of calculating them using firstprinciples approaches, there is a need to develop machine learning approaches to predict the characteristic classes associated with a material system. To aid in this program we introduce the *Haldane bundle dataset*, which consists of synthetically generated complex line bundles on the $2$torus. We envision this dataset, which is not as challenging as noisy and sparsely measured realworld datasets but (as we show) still difficult for offtheshelf architectures, to be a testing ground for architectures that incorporate the rich topological and geometric priors underlying characteristic classes.

Cody Tipton · Elizabeth Coda · Davis Brown · Alyson Bittner · Caitlin Hutten · Grayson Jorgenson · Tegan Emerson · Henry Kvinge 🔗 


How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks
(
Poster
)
>
link
Transformers trained on huge text corpora exhibit a remarkable set of capabilities. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. In this work, we train Transformer models on a datagenerating process that involves compositions of a set of welldefined monolithic capabilities and show that: (1) Transformers generalize to exponentially or even combinatorially many functions not seen in the training data; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions; (3) the training data has a significant impact on the model’s ability to compose functions (4) Attention layers in the latter half of the model seem critical to compositionality. 
Rahul Ramesh · Mikail Khona · Robert Dick · Hidenori Tanaka · Ekdeep S Lubana 🔗 


Structurewise Uncertainty for Curvilinear Image Segmentation
(
Poster
)
>
link
Segmenting curvilinear structures like blood vessels and roads poses significant challenges due to their intricate geometry and weak signals. To expedite largescale annotation, it is essential to adopt semiautomatic methods such as proofreading by human experts. In this abstract, we focus on estimating uncertainty for such tasks, so that highly uncertain, and thus errorprone structures can be identified for human annotators to verify. Unlike prior work that generates pixelwise uncertainty maps, we believe it is essential to measure uncertainty in the units of topological structures, e.g., small pieces of connections and branches. To realize this, we employ tools from topological data analysis, specifically discrete Morse theory (DMT), to first extract the structures and then reason about their uncertainties. On multiple 2D and 3D datasets, our methodology generates superior structurewise uncertainty maps compared to existing models. 
Saumya Gupta · Xiaoling Hu · Chao Chen 🔗 


On the Varied Faces of Overparameterization in Supervised and SelfSupervised Learning
(
Poster
)
>
link
The quality of the representations learned by neural networks depends on several factors, including the loss function, learning algorithm, and model architecture. In this work, we use information geometric measures to assess the representation quality in a principled manner. We demonstrate that the sensitivity of learned representations to input perturbations, measured by the spectral norm of the feature Jacobian, provides valuable information about downstream generalization. On the other hand, measuring the coefficient of spectral decay observed in the eigenspectrum of feature covariance provides insights into the global representation geometry. First, we empirically establish an equivalence between these notions of representation quality and show that they are inversely correlated. Second, our analysis reveals the varying roles that overparameterization plays in improving generalization. Unlike supervised learning, we observe that increasing model width leads to higher discriminability and less smoothness in the selfsupervised regime.Furthermore, we report that there is no observable double descent phenomenon in SSL with noncontrastive objectives for commonly used parameterization regimes, which opens up new opportunities for tight asymptotic analysis. Taken together, our results provide a lossaware characterization of the different role of overparameterization in supervised and selfsupervised learning. 
Matteo Gamba · Arna Ghosh · Kumar Krishna Agrawal · Blake Richards · Hossein Azizpour · Mårten Björkman 🔗 


Geometric Epitope and Paratope Prediction
(
Poster
)
>
link
Antibodyantigen interactions play a crucial role in identifying and neutralizing harmful foreign molecules. In this paper, we investigate the optimal representation for predicting the binding sites in the two molecules and emphasize the importance of geometric information. Specifically, we compare different geometric deep learning methods applied to proteins’ inner (IGEP) and outer (OGEP) structures. We incorporate 3D coordinates and spectral geometric descriptors as input features to fully leverage the geometric information. Our research suggests that surfacebased models are more efficient than other methods, and our OGEP experiments have achieved stateoftheart results with significant performance improvements. 
Marco Pegoraro · Clémentine Dominé · Emanuele Rodolà · Petar Veličković · AndreeaIoana Deac 🔗 


RelWire: Metric Based Graph Rewiring
(
Poster
)
>
link
Oversquashing is a major hurdle to the application of geometric deep learning and graph neural networks to real applications. Recent work has found connections between oversquashing and commute times, effective resistance, and the eigengap of the underlying graph. Graph rewiring is the most promising technique to alleviate this issue. Some prior work adds edges locally to highly negatively curved subgraphs. These local changes, however, have a small effect on global statistics such as commute times and the eigengap. Other prior work uses the spectrum of the graph Laplacian to target rewiring to increase the eigengap. These approaches, however, make large structural and topological changes to the underlying graph. We use ideas from geometric group theory to present \textsc{RelWire}, a rewiring technique based on the geometry of the graph. We derive topological connections for \textsc{RelWire}. We then rewire different real world molecule datasets and show that \textsc{RelWire} is Pareto optimal: it has the best balance between improvement in eigengap and commute times and minimizing changes in the topology of the underlying graph. 
Rishi Sonthalia · Anna Gilbert · Matthew Durham 🔗 


Sheafbased Positional Encodings for Graph Neural Networks
(
Poster
)
>
link
Graph Neural Networks (GNNs) work directly with graphstructured data, capitalising on relational information among entities. One limitation of GNNs is their reliance on local interactions among connected nodes. GNNs may generate identical node embeddings for similar local neighbourhoods and fail to distinguish structurally distinct graphs. Positional encodings help to break the locality constraint by informing the nodes of their global positions in the graph. Furthermore, they are required by Graph Transformers to encode structural information. However, existing positional encodings based on the graph Laplacian only encode structural information and are typically fixed. To address these limitations, we propose a novel approach to design positional encodings using sheaf theory. The sheaf Laplacian can be learnt from node data, allowing it to encode both the structure and semantic information. We present two methodologies for creating sheafbased positional encodings, showcasing their efficacy in node and graph tasks. Our work advances the integration of sheaves in graph learning, paving the way for innovative GNN techniques that draw inspiration from geometry and topology. 
Yu He · Cristian Bodnar · Pietro Lió 🔗 


Structural Similarities Between Language Models and Neural Response Measurements
(
Poster
)
>
link
Large language models have complicated internal dynamics, but induce representations of words and phrases whose geometry we can study. Human language processing is also opaque, but neural response measurements can provide (noisy) recordings of activations during listening or reading, from which we can extract similar representations of words and phrases. Here we study the extent to which the geometries induced by these representations, share similarities in the context of brain decoding. We find that the larger neural language models get, the more their representations are structurally similar to neural response measurements from brain imaging. 
Jiaang Li · Antonia Karamolegkou · Yova Kementchedjhieva · Mostafa Abdou · Sune Lehmann · Anders Søgaard 🔗 


INRFormer: Neuron Permutation Equivariant Transformer on Implicit Neural Representations
(
Poster
)
>
link
Implicit Neural Representations (INRs) have demonstrated both precision in continuous data representation and compactness in encapsulating highdimensional data. Yet, much of contemporary research remains centered on data reconstruction using INRs, with limited exploration into processing INRs themselves. In this paper, we endeavor to develop a model tailored to process INRs explicitly for computer vision tasks. We conceptualize INRs as computational graphs with neurons as nodes and weights as edges. To process INR graphs, we propose INRFormer consisting of the node blocks and the edge blocks alternatively. Within the node block, we further propose SlidingLayerAttention (SLA), which performs attention on nodes of three sequential INR layers. This sliding mechanism of the SLA across INR layers enables each layer's nodes to access a broader scope of the entire graph's information. In terms of the edge block, every edge's feature vector gets concatenated with the features of its two linked nodes, followed by a projection via an MLP. Ultimately, we formulate the visual recognition as INRtoINR (inr2inr) translations. That is, INRFormer transforms the input INR, which maps coordinates to image pixels, to a target INR, which maps the coordinates to the labels. We demonstrate INRFormer on CIFAR10. 
Lei Zhou · Varun Belagali · Joseph Bae · Prateek Prasanna · Dimitris Samaras 🔗 


From Charts to Atlas: Merging Latent Spaces into One
(
Poster
)
>
link
Models trained on semantically related datasets and tasks exhibit comparable intersample relations within their latent spaces.We investigate in this study the aggregation of such latent spaces to create a unified space encompassing the combined information.To this end, we introduce Relative Latent Space Aggregation (RLSA), a twostep approach that first renders the spaces comparable using relative representations, and then aggregates them via a simple mean. We carefully divide a classification problem into a series of learning tasks under three different settings: sharing samples, classes, or neither. We then train a model on each task and aggregate the resulting latent spaces. We compare the aggregated space with that derived from an endtoend model trained over all tasks and show that the two spaces are similar. We then observe that the aggregated space is better suited for classification, and empirically demonstrate that it is due to the unique imprints left by taskspecific embedders within the representations. We finally test our framework in scenarios where no shared region exists and show that it can still be used to merge the spaces, albeit with diminished benefits over naive merging. 
Donato Crisostomi · Irene Cannistraci · Luca Moschella · Pietro Barbiero · Marco Ciccone · Pietro Lió · Emanuele Rodolà 🔗 


Growing Brains in Recurrent Neural Networks for Multiple Cognitive Tasks
(
Poster
)
>
link
Recurrent neural networks (RNNs) trained on a diverse ensemble of cognitive tasks, as described by Yang et al (2019); Khona et al. (2023), have been shown to exhibit functional modularity, where neurons organize into discrete functional clusters, each specialized for specific shared computational subtasks. However, these RNNs do not demonstrate anatomical modularity, where these functionally specialized clusters also have a distinct spatial organization. This contrasts with the human brain which has both functional and anatomical modularity. Is there a way to train RNNs to make them more like brains in this regard? We apply a recent machine learning method, braininspired modular training (BIMT), to encourage neural connectivity to be local in space. Consequently, hidden neuron organization of the RNN forms spatial structures reminiscent of those of the brain: spatial clusters which correspond to functional clusters. Compared to standard $L_1$ regularization and absence of regularization, BIMT exhibits superior performance by optimally balancing between task performance and sparsity. This balance is quantified both in terms of the number of active neurons and the cumulative wiring length. In addition to achieving brainlike organization in RNNs, our findings also suggest that BIMT holds promise for applications in neuromorphic computing and enhancing the interpretability of neural network architectures.

Ziming Liu · Mikail Khona · Ila Fiete · Max Tegmark 🔗 


Are “Hierarchical” Visual Representations Hierarchical?
(
Poster
)
>
link
Learned visual representations often capture large amounts of semantic information for accurate downstream applications. Human understanding of the world is fundamentally grounded in hierarchy. To mimic this and further improve representation capabilities, the community has explored "hierarchical'' visual representations that aim at modeling the underlying hierarchy of the visual world. In this work, we set out to investigate if hierarchical visual representations truly capture the human perceived hierarchy better than standard learned representations. To this end, we create HierNet, a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet. After extensive evaluation of Hyperbolic and Matryoshka Representations across training setups, we conclude that they do not capture hierarchy any better than the standard representations but can assist in other aspects like search efficiency and interpretability. Our benchmark and the datasets are opensourced at https://github.com/ethanlshen/HierNet. 
Ethan Shen · Ali Farhadi · Aditya Kusupati 🔗 


Homological Convolutional Neural Networks
(
Poster
)
>
link
Deep learning methods have demonstrated outstanding performances on classification and regression tasks on homogeneous data types (e.g., image, audio, and text data). However, tabular data still pose a challenge, with classic machine learning approaches being often computationally cheaper and equally effective than increasingly complex deep learning architectures. The challenge arises from the fact that, in tabular data, the correlation among features is weaker than the one from spatial or semantic relationships in images or natural language, and the dependency structures need to be modeled without any prior information. In this work, we propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations to gain relational information from sparse tabular inputs. The resulting model leverages the power of convolution and is centered on a limited number of concepts from network topology to guarantee: (i) a datacentric and deterministic building pipeline; (ii) a high level of interpretability over the inference process; and (iii) an adequate room for scalability. We test our model on $18$ benchmark datasets against $5$ classic machine learning and $3$ deep learning models, demonstrating that our approach reaches stateoftheart performances on these challenging datasets. The code to reproduce all our experiments is provided at https://github.com/FinancialComputingUCL/HomologicalCNN.

Antonio Briola · Yuanrong Wang · Silvia Bartolucci · Tomaso Aste 🔗 


Visual Scene Representation with Hierarchical Equivariant Sparse Coding
(
Poster
)
>
link
We propose a hierarchical neural network architecture for unsupervised learning of equivariant partwhole decompositions of visual scenes. In contrast to the global equivariance of groupequivariant networks, the proposed architecture exhibits equivariance to partwhole transformations throughout the hierarchy, which we term hierarchical equivariance. The model achieves such internal representations via hierarchical Bayesian inference, which gives rise to rich bottomup, topdown, and lateral information flows, hypothesized to underlie the mechanisms of perceptual inference in visual cortex. We demonstrate these useful properties of the model on a simple dataset of scenes with multiple objects under independent rotations and translations. 
Christian A Shewmake · Domas Buracas · Hansen Lillemark · Jinho Shin · Erik Bekkers · Nina Miolane · Bruno Olshausen 🔗 


Symmetrybased Learning of Radiance Fields for Rigid Objects
(
Poster
)
>
link
In this work, we present SymObjectRF, a symmetrybased method that learns objectcentric representations for rigid objects from one dynamic scene without handcrafted annotations. SymObjectRF learns the appearance and surface geometry of all dynamic object in their canonical poses and represents individual object within its canonical pose using a canonical object field (COF). SymObjectRF imposes group equivariance on rendering pipeline by transforming 3D point samples from world coordinate to object canonical poses. Subsequently, a permutationinvariant compositional renderer combines the color and density values queried from the learned COFs and reconstructs the input scene via volume rendering. SymObjectRF is then optimized by minimizing scene reconstruction loss. We show the feasibility of SymObjectRF in learning objectcentric representations both theoretically and empirically. 
Zhiwei Han · Stefan Matthes · Hao Shen · Yuanting Liu 🔗 


Decorrelating neurons using persistence
(
Poster
)
>
link
We propose a novel way to regularise deep learning models by reducing high correlations between neurons. For this, we present two regularisation terms computed from the weights of a minimum spanning tree of the clique whose vertices are the neurons of a given network (or a sample of those), where weights on edges are correlation dissimilarities. We explore their efficacy by performing a set of proofofconcept experiments, for which our new regularisation terms outperform some popular ones. We demonstrate that, in these experiments, naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms. This suggests that redundancies play a significant role in artificial neural networks, as evidenced by some studies in neuroscience for real networks. We include a proof of differentiability of our regularisers, thus developing the first effective topological persistencebased regularisation terms that consider the whole set of neurons and that can be applied to a feedforward architecture in any deep learning task such as classification, data generation, or regression. 
Rubén Ballester · Carles Casacuberta · Sergio Escalera 🔗 


Scalar Invariant Networks with Zero Bias
(
Poster
)
>
link
Just like weights, bias terms are learnable parameters in many popular machine learning models, including neural networks. Biases are believed to enhance the representational power of neural networks, enabling them to tackle various tasks in computer vision. Nevertheless, we argue that biases can be disregarded for some imagerelated tasks such as image classification, by considering the intrinsic distribution of images in the input space and desired model properties from first principles. Our empirical results suggest that zerobias neural networks can perform comparably to normal networks for practical image classification tasks. Furthermore, we demonstrate that zerobias neural networks possess a valuable property known as scalar (multiplicative) invariance. This implies that the network's predictions remain unchanged even when the contrast of the input image is altered. We further extend the scalar invariance property to more general cases, thereby attaining robustness within specific convex regions of the input space. We believe dropping bias terms can be considered as a geometric prior when designing neural network architecture for image classification, which shares the spirit of adapting convolutions as the translational invariance prior. 
Chuqin Geng · Xiaojie Xu · Haolin Ye · Xujie Si 🔗 


Fast Temporal Wavelet Graph Neural Networks
(
Poster
)
>
link
Spatiotemporal signals forecasting plays an important role in numerous domains, especially in neuroscience and transportation. The task is challenging due to the highly intricate spatial structure, as well as the nonlinear temporal dynamics of the network. To facilitate reliable and timely forecast for the human brain and traffic networks, we propose the Fast Temporal Wavelet Graph Neural Networks (FTWGNN) that is both time and memoryefficient for learning tasks on timeseries data with the underlying graph structure, thanks to the theories of multiresolution analysis and wavelet theory on discrete spaces. We employ Multiresolution Matrix Factorization (MMF) (Kondor et al., 2014) to factorize the highly dense graph structure and compute the corresponding sparse wavelet basis that allows us to construct fast wavelet convolution as the backbone of our novel architecture. Experimental results on realworld PEMSBAY, METRLA traffic datasets and AJILE12 ECoG dataset show that FTWGNN is competitive with the stateofthearts while maintaining a low computational footprint. Our PyTorch implementation is publicly available at https://github.com/HySonLab/TWGNN 
Duc Thien Nguyen · Tuan Nguyen · Truong Son Hy · Risi Kondor 🔗 


Manifoldaugmented Eikonal Equations: Geodesic Distances and Flows on Differentiable Manifolds.
(
Poster
)
>
link
Manifolds discovered by machine learning models provide a compact representation of the underlying data. Geodesics on these manifolds define locally lengthminimising curves and provide a notion of distance, which are key for reducedorder modelling, statistical inference, and interpolation. In this work, we propose a modelbased parameterisation for distance fields and geodesic flows on manifolds, exploiting solutions of a manifoldaugmented Eikonal equation. We demonstrate how the geometry of the manifold impacts the distance field, and exploit the geodesic flow to obtain globally lengthminimising curves directly. This work opens opportunities for statistics and reducedorder modelling on differentiable manifolds. 
Daniel Kelshaw · Luca Magri 🔗 


Pitfalls in Measuring Neural Transferability
(
Poster
)
>
link
Transferability scores quantify the aptness of the pretrained models for a downstream task and help in selecting an optimal pretrained model for transfer learning. This work aims to draw attention to the significant shortcomings of stateoftheart transferability scores. To this aim, we propose neural collapsebased transferability scores that analyse intraclass variability collapse and interclass discriminative ability of the penultimate embedding space of a pretrained model. The experimentation across the image and audio domains demonstrates that such a simple variability analysis of the feature space is more than enough to satisfy the current definition of transferability scores, and there is a requirement for a new generic definition of transferability. Further, building on these results, we highlight new research directions and postulate characteristics of an ideal transferability measure that will be helpful in streamlining future studies targeting this problem. 
Suryaka Suresh · Vinayak Abrol · Anshul Thakur 🔗 


Random Field Augmentations for SelfSupervised Representation Learning
(
Poster
)
>
link
Selfsupervised representation learning is heavily dependent on data augmentations to specify the invariances encoded in representations. Previous work has shown that applying diverse data augmentations is crucial to downstream performance, but augmentation techniques remain underexplored. In this work, we propose a new family of local transformations based on Gaussian random fields to generate image augmentations for selfsupervised representation learning. These transformations generalize the wellestablished affine and color transformations (translation, rotation, color jitter, etc.) and greatly increase the space of augmentations by allowing transformation parameter values to vary from pixel to pixel. The parameters are treated as continuous functions of spatial coordinates, and modeled as independent Gaussian random fields. Empirical results show the effectiveness of the new transformations for selfsupervised representation learning. Specifically, we achieve a 1.7% top1 accuracy improvement over baseline on ImageNet downstream classification, and a 3.6% improvement on outofdistribution iNaturalist downstream classification. However, due to the flexibility of the new transformations, learned representations are sensitive to hyperparameters. While mild transformations improve representations, we observe that strong transformations can degrade the structure of an image, indicating that balancing the diversity and strength of augmentations is important for improving generalization of learned representations. 
Philip Mansfield · Arash Afkanpour · Warren Morningstar · Karan Singhal 🔗 


Changes in the geometry of hippocampal representations across brain states
(
Poster
)
>
link
The hippocampus (HPC) is a key structure of the brain's capacity to learn and generalize. One pervasive phenomenon in the brain, but missing in AI, is the presence of different gross brain states. It is known that these different brain states give rise to diverse modes of information processing that are imperative for hippocampus to learn and function, but the mechanisms by which they do so remain unknown. To study this, we harnessed the power of recently developed dimensionality reduction techniques to shed insight on how HPC representations change across brain states. We compared the geometry of HPC neuronal representations when rodents learn to generalize across different environments, and showed that HPC representation could support both pattern separation and generalization. Next, we compared HPC activity during different stages of sleep. Consistent with the literature, we found a robust recapitulation of the previous awake experience during non rapid eye movement sleep (NREM). But interestingly, such geometric correspondence to previous awake experience was not observed during rapid eye movement sleep (REM), suggesting a very different mode of information processing. This is the first known report of UMAP analysis on hippocampal neuronal data during REM sleep. We propose that characterizing and contrasting the geometry of hippocampal representations during different brain states can help understand the brain's mechanisms for learning, and in the future, can even help design next generation of AI that learn and generalize better. 
Wannan Yang · Chen Sun · Gyorgy Buzsaki 🔗 


EntropyMCMC: Sampling from Flat Basins with Ease
(
Poster
)
>
link
Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multimodal in nature, with local modes exhibiting varying generalization performances. Given a practical budget, sampling from the original posterior can lead to suboptimal performances, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias the sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. We prove the convergence of our method and further show that it converges faster than several existing flatnessaware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration and outofdistribution detection. 
Bolian Li · Ruqi Zhang 🔗 


Rototranslation Equivariant YOLO for Aerial Images
(
Poster
)
>
link
This work introduces EqYOLO, an Equivariant OneStage Object Detector based on YOLOv8 incorporating group convolutions to handle rotational transformations. We show the interest of using equivarianttransforms to improve the detection performance on rotated data over the regular YOLOv8 model while dividing the number of parameters to train by a factor greater than three. 
Benjamin Maurel · Samy Blusseau · Santiago VelascoForero · Teodora Petrisor 🔗 


Fulldimensional Characterisation of TimeWarped SpikeTime StimulusResponse Distribution Geometries
(
Poster
)
>
link
Characterising the representation of sensory stimuli in the brain is a fundamental scientific endeavor, which can illuminate principles of information coding. Most characterizations reduce the dimensionality of neural data by converting spike trains to firing rates or binned spike counts, applying explicitly named methods of "dimensionality reduction", or collapsing trialtotrial variability. Characterisation of the fulldimensional geometry of timingbased representations may provide unexpected insights into how complex highdimensional information is encoded. Recent research shows that the distribution of representations elicited over trials of a single stimulus can be geometrically characterized without the application of dimensionality reduction, maintaining the temporal spiking information of individual neurons in a cell assembly and illuminating rich geometric structure. We extend these results, showing that precise spike time patterns for larger cell assemblies are timewarped (i.e. stretched or compressed) on each trial. Moreover, by geometrically characterizing distributions of large spike time patterns, our analysis supports the hypothesis that the degree to which a spike time pattern is timewarped depends on the cortical area's background activity level on a single trial. Finally, we suggest that the proliferation of large electrophysiology datasets and the increasing concentration of "neural geometrists", creates ideal conditions for characterization of fulldimensional spike time representations, in complement to dimensionality reduction approaches. 
James Isbister 🔗 


Emergence of Latent Binary Encoding in Deep Neural Network Classifiers
(
Poster
)
>
link
We observe the emergence of binary encoding within the latent space of deepneuralnetwork classifiers.Such binary encoding is induced by introducing a linear penultimate layer, which is equipped during training with a loss function that grows as $\exp(\vec{x}^2)$, where $\vec{x}$ are the coordinates in the latent space. The phenomenon we describe represents a specific instance of a welldocumented occurrence known as \textit{neural collapse}, which arises in the terminal phase of training and entails the collapse of latent class means to the vertices of a simplex equiangular tight frame (ETF).We show that binary encoding accelerates convergence toward the simplex ETF and enhances classification accuracy.

Luigi Sbailò · Luca Ghiringhelli 🔗 


Testing Assumptions Underlying a Unified Theory for the Origin of Grid Cells
(
Poster
)
>
link
Representing and reasoning about physical space is fundamental to animal survival, and the mammalian lineage expresses a wealth of specialized neural representations that encode space. Grid cells, whose discovery earned a Nobel prize, are a striking example: a grid cell is a neuron that fires if and only if the animal is spatially located at the vertices of a regular triangular lattice that tiles all explored twodimensional environments. Significant theoretical work has gone into understanding why mammals have learned these particular representations, and recent work has proposed a ``unified theory for the computational and mechanistic origin of grid cells," claiming to answer why the mammalian lineage has learned grid cells. However, the Unified Theory makes a series of highly specific assumptions about the target readouts of grid cells  putatively place cells. In this work, we explicitly identify what these mathematical assumptions are, then test two of the critical assumptions using biological place cell data. At both the population and singlecell levels, we find evidence suggesting that neither of the assumptions are likely true in biological neural representations. These results call the Unified Theory into question, suggesting that biological grid cells likely have a different origin than those obtained in trained artificial neural networks. 
Rylan Schaeffer · Mikail Khona · Adrian Bertagnoli · Sanmi Koyejo · Ila Fiete 🔗 


SO(3)Equivariant Representation Learning in 2D Images
(
Poster
)
>
link
Imaging physical objects that are free to rotate and translate in 3D is challenging. Whilean object’s pose and location do not change its nature, varying them presents problemsfor current vision models. Equivariant models account for these nuisance transformations,but current architectures only model either 2D transformations of 2D signals or 3D transformations of 3D signals. Here, we propose a novel convolutional layer consisting of 2Dprojections of 3D filters that models 3D equivariances of 2D signals—critical for capturingthe full space of spatial transformations of objects in imaging domains such as cryoEM. Weadditionally present methods for aggregating our rotationspecific outputs. We demonstrate improvement on several tasks, including particle picking and pose estimation. 
Darnell Granberry · Alireza Nasiri · Jiayi Shou · Alex J. Noble · Tristan Bepler 🔗 


SelfSupervised Latent Symmetry Discovery via ClassPose Decomposition
(
Poster
)
>
link
In this paper, we explore the discovery of latent symmetries of data in a selfsupervised manner. By considering sequences of observations undergoing uniform motion, we can extract a shared group transformation from the latent observations. In contrast to previous work, we utilize a latent space in which the group and orbit component are decomposed. We show that this construction facilitates more accurate identification of the properties of the underlying group, which consequently results in an improved performance on a set of sequential prediction tasks. 
Gustaf Tegnér · Hedvig Kjellstrom 🔗 


Discovering Latent Causes and Memory Modification: A Computational Approach Using Symmetry and Geometry
(
Poster
)
>
link
We learn from our experiences, even though they are never exactly the same. This implies that we need to assess their similarity to apply what we have learned from one experience to another. It is proposed that we “cluster” our experiences based on hidden latent causes that we infer. It is also suggested that surprises, which occur when our predictions are incorrect, help us categorize our experiences into distinct groups. In this paper, we develop a computational theory that emulates these processes based on two basic concepts from intuitive physics and Gestalt psychology using symmetry and geometry. We apply our approach to simple tasks that involve inductive reasoning. Remarkably, the output of our computational approach aligns closely with human responses. 
Arif Dönmez 🔗 


On the Information Geometry of Vision Transformers
(
Poster
)
>
link
Understanding the structure of highdimensional representations learned by Vision Transformers (ViTs) provides a pathway toward developing a mechanistic understanding and further improving architecture design. In this work, we leverage tools from informationgeometry to characterize representation quality at a pertoken (intratoken) level as well as across pairs of tokens (intertoken) in ViTs pretrained for object classification. In particular, we observe that these highdimensional tokens exhibit a characteristic spectral decay inthe feature covariance matrix. By measuring the rate of this decay (denoted by $\alpha$) for each token across transformer blocks, we discover an $\alpha$ signature, indicative of a transition from lower to higher effective dimensionality. We also demonstrate that tokens can be clustered based on their $\alpha$ signature, revealing that tokens corresponding to nearby spatial patches of the original image exhibit similar $\alpha$ trajectories. Furthermore, for measuring the complexity at the sequence level, we aggregate the correlation between pairs of tokens independently at each transformer block. A higher average correlation indicates a significant overlap between token representations and lower effective complexity. Notably, we observe a Ushaped trend across the model hierarchy, suggesting that token representations are more expressive in the intermediate blocks. Our findings provide a framework for understanding information processing in ViTs while providing tools to prune/merge tokens across blocks, thereby making the architectures more efficient.

Sonia Joseph · Kumar Krishna Agrawal · Arna Ghosh · Blake Richards 🔗 


The Variability of Representations in Mice and Humans Changes with Learning, Engagement, and Attention
(
Poster
)
>
link
In responding to a visual stimulus, cortical neurons exhibit a high degree of variability, and this variability can be correlated across neurons. In this study, we use recordings from both mice and humans to systematically characterize how the variability in the representation of visual stimuli changes with learning, engagement and attention. We observe that in mice, familiarization with a set of images over many weeks reduces the variability of responses, but does not change its shape. Further, switching from passive to active task engagement changes the overall shape by shrinking the neural variability only along the taskrelevant direction, leading to a higher signaltonoise ratio. In a selective attention task in humans wherein multiple distributions are compared, a higher signaltonoise ratio is obtained via a different mechanism, by mainly increasing the signal of the attended category. These findings show that representation variability can be adjusted with task needs. A potential speculative role for variability, consistent with these findings, is that it helps generalization. 
Praveen Venkatesh · Corbett Bennett · Sam Gale · Juri Minxha · Hristos Courellis · Greggory Heller · Tamina Ramirez · Severine Durand · Ueli Rutishauser · Shawn Olsen · Stefan Mihalas



Explicit Neural Surfaces: Learning Continuous Geometry with Deformation Fields
(
Poster
)
>
link
We introduce Explicit Neural Surfaces (ENS), an efficient smooth surface representation that directly encodes topology with a deformation field from a known base domain. We apply this representation to reconstruct explicit surfaces from multiple views, where we use a series of neural deformation fields to progressively transform the base domain into a target shape. By using meshes as discrete surface proxies, we train the deformation fields through efficient differentiable rasterization. Using a fixed base domain allows us to have LaplaceBeltrami eigenfunctions as an intrinsic positional encoding alongside standard extrinsic Fourier features, with which our approach can capture fine surface details. Compared to implicit surfaces, ENS trains faster and has several orders of magnitude faster inference times. The explicit nature of our approach also allows higherquality mesh extraction whilst maintaining competitive surface reconstruction performance and realtime capabilities. 
Thomas Walker · Octave Mariotti · Amir Vaxman · Hakan Bilen 🔗 


Symmetric Models for Radar Response Modeling
(
Poster
)
>
link
Many radar applications require complex radar signature models that incorporate characteristics of an object's shape and dynamics as well as sensing effects. Even though highfidelity, firstprinciples radar simulators are available, they tend to be resourceintensive and do not easily support the requirements of agile and largescale AI development and evaluation frameworks. Deep learning represents an attractive alternative to these numerical methods, but can have large data requirements and limited generalization ability. In this work, we present the Radar Equivariant Model (REM), the first $SO(3)$equivaraint model for predicting radar responses from object meshes. By constraining our model to the symmetries inherent to radar sensing, REM is able to achieve a high level reconstruction of signals generated by a firstprinciples radar model and shows improved performance and sample efficiency over other encoderdecoder models.

Colin Kohler · Nathan Vaska · Ramya Muthukrishnan · Whangbong Choi · Jung Yeon Park · Justin Goodwin · Rajmonda Caceres · Robin Walters 🔗 


The Surprising Effectiveness of Equivariant Models in Domains with Latent Symmetry
(
Poster
)
>
link
Extensive work has demonstrated that equivariant neural networks can significantly improve sample efficiency and generalization by enforcing an inductive bias in the network architecture. These applications typically assume that the domain symmetry is fully described by explicit transformations of the model inputs and outputs. However, many reallife applications contain only latent or partial symmetries which cannot be easily described by simple transformations of the input. In these cases, it is necessary to \emph{learn} symmetry in the environment instead of imposing it mathematically on the network architecture. We discover, surprisingly, that imposing equivariance constraints that do not exactly match the domain symmetry is very helpful in learning the true symmetry in the environment. We differentiate between \emph{extrinsic} and \emph{incorrect} symmetry constraints and show that while imposing incorrect symmetry can impede the model's performance, imposing extrinsic symmetry can actually improve performance. We demonstrate that an equivariant model can significantly outperform nonequivariant methods on domains with latent symmetries. 
Dian Wang · Jung Yeon Park · Neel Sortur · Lawson Wong · Robin Walters · Robert Platt 🔗 


Large language models partially converge toward humanlike concept organization
(
Poster
)
>
link
Large language models show humanlike performance in knowledge extraction, reasoning and dialogue, but it remains controversial whether this performance is best explained by memorization and pattern matching, or whether it reflects humanlike inferential semantics and world knowledge. Knowledge bases such as WikiData provide largescale, highquality representations of inferential semantics and world knowledge. We show that large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in such knowledge bases. Knowledge bases model collective, institutional knowledge, and large language models seem to induce such knowledge from raw text. We show that bigger and better models exhibit more humanlike concept organization, across four families of language models and three knowledge graph embeddings. 
Jonathan Gabel Christiansen · Mathias Gammelgaard · Anders Søgaard 🔗 


Cayley Graph Propagation
(
Poster
)
>
link
In spite of the plethora of success stories with graph neural networks (GNNs) on modelling graphstructured data, they are notoriously vulnerable to tasks which necessitate mixing of information between distant pairs of nodes, especially in the presence of bottlenecks in the graph. For this reason, a significant body of research has dedicated itself to discovering or precomputing graph structures which ameliorate such bottlenecks. Bottleneckfree graphs are wellknown in the mathematical community as *expander graphs*, with prior work—Expander Graph Propagation (EGP)—proposing the use of a wellknown expander graph family—the Cayley graphs of the $\mathrm{SL}(2,\mathbb{Z}_n)$ special linear group—as a computational template for GNNs. However, despite its solid theoretical grounding, the actual computational graphs used by EGP are *truncated* Cayley graphs, which causes them to lose expansion properties. In this work, we propose to use the full Cayley graph within EGP, recovering significant improvements on datasets from the Open Graph Benchmark (OGB). Our empirical evidence suggests that the retention of the nodes in the expander graph can provide benefit for graph representation learning, which may provide valuable insight for future models.

Joseph Wilson · Petar Veličković 🔗 


Curvature Fields from Shading Fields
(
Poster
)
>
link
We reexamine the estimation of 3D shape from images that are caused by shading of diffuse Lambertian surfaces. We propose a neural model that is motivated by the welldocumented perceptual effect in which shape is perceived from shading without a precise perception of lighting. Our model operates independently in each receptive field and produces a scalar statistic of surface curvature for that field. The model’s architecture builds on previous mathematical analyses of lightinginvariant shape constraints, and it leverages geometric structure to provide equivariance under image rotations and translations. Applying our model in parallel across a dense set of receptive fields produces a curvature field that we show is quite stable under changes to a surface’s albedo pattern (texture) and also to changes in lighting, even when lighting varies spatially across the surface. 
Xinran Han · Todd Zickler 🔗 


A Comparison of Equivariant Vision Models with ImageNet Pretraining
(
Poster
)
>
link
Neural networks pretrained on large datasets provide useful embeddings for downstream tasks and allow researchers to iterate with less compute. For computer vision tasks, ImageNet pretrained models can be easily downloaded for finetuning.However, no such pretrained models are available that are equivariant to image transformations. In this work, we implement several equivariant versionsof the residual network architecture and publicly release the weights aftertraining on ImageNet. Additionally, we perform a comparison of enforced vs.learned equivariance in the largest data regime to date. 
David Klee · Jung Yeon Park · Robert Platt · Robin Walters 🔗 


Almost Equivariance via Lie Algebra Convolutions
(
Poster
)
>
link
Recently, the $\textit{equivariance}$ of models with respect to a group action hasbecome an important topic of research in machine learning. Analysis of the builtin equivariance ofexisting neural network architectures, as well as the study of methods for building model architectures that explicitly ``bake in'' equivariance, have become significant research areas in their own right.However, imbuing an architecture with a specific group equivariance imposes a strong prior on the types of data transformations that the model expects to see. While strictlyequivariant models enforce symmetries, suchas those due to rotations or translations, realworld data does not always follow such strict equivariances,be it due to noise in the data or underlying physical laws that encode only approximate or partial symmetries.In such cases, the prior of strict equivariance can actually prove too strong and cause models to underperform on realworld data. Therefore, in this work we study a closely related topic, that of $\textit{almost equivariance}$. We give a practical method for encodingalmost equivariance in models by appealing to the Lie algebra of a Lie group and defining $\textit{Lie algebra convolutions}$.We demonstrate that Lie algebra convolutions offer several benefits over Lie group convolutions, including being computationally tractable and welldefined for noncompact groups.Finally, we demonstrate the validity of our approach by benchmarking against datasets in fully equivariant and almost equivariant settings.

Daniel McNeela 🔗 


Deep Ridgelet Transform: Voice with Koopman Operator Constructively Proves Universality of Formal Deep Networks
(
Poster
)
>
link
We identify hidden layers inside a deep neural network (DNN) with group actions on the data domain, and formulate a formal deep network as a dual voice transform with respect to the Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of DNNs. 
Sho Sonoda · Yuka Hashimoto · Isao Ishikawa · Masahiro Ikeda 🔗 


Learning Symmetrization for Equivariance with Orbit Distance Minimization
(
Poster
)
>
link
We present a general framework for symmetrizing an arbitrary neuralnetwork architecture and making it equivariant with respect to a given group. We build upon the proposals of Kim et al. (2023); Kaba et al. (2023) for symmetrization, and improve them by replacing their conversion of neural features into group representations, with an optimization whose loss intuitively measures the distance between group orbits. This change makes our approach applicable to a broader range of matrix groups, such as the Lorentz group O(1, 3), than these two proposals. We experimentally show our method’s competitiveness on the SO(2) image classification task, and also its increased generality on the task with O(1, 3). Our implementation will be made accessible at https://github.com/tiendatnguyenvision/Orbitsymmetrize. 
Dat Nguyen · Jinwoo Kim · Hongseok Yang · Seunghoon Hong 🔗 


Algebraic Topological Networks via the Persistent Local Homology Sheaf
(
Poster
)
>
link
In this work, we introduce a novel approach based on algebraic topology to enhance graph convolution and attention modules by incorporating local topological properties of the data. To do so, we consider the framework of sheaf neural networks, which has been previously leveraged to incorporate additional structure into graph neural networks’ features and construct more expressive, nonisotropic messages. Specifically, given an input simplicial complex (e.g. generated by the cliques of a graph or the neighbors in a point cloud), we construct its local homology sheaf, which assigns to each node the vector space of its local homology. The intermediate features of our networks live in these vector spaces and we leverage the associated sheaf Laplacian to construct more complex linear messages between them. Moreover, we extend this approach by considering the persistent version of local homology associated with a weighted simplicial complex (e.g., built from pairwise distances of nodes embeddings). This i) solves the problem of the lack of a natural choice of basis for the local homology vector spaces and ii) makes the sheaf itself differentiable, which enables our models to directly optimize the topology of their intermediate features. 
Gabriele Cesa · Arash Behboodi 🔗 


Neural Lattice Reduction: A SelfSupervised Geometric Deep Learning Approach
(
Poster
)
>
link
Lattice reduction is a combinatorial optimization problem aimed at finding the most orthogonal basis in a given lattice. In this work, we address lattice reduction via deep learning methods. We design a deep neural model outputting factorized unimodular matrices and train it in a selfsupervised manner by penalizing nonorthogonal lattice bases. We incorporate the symmetries of lattice reduction into the model by making it invariant and equivariant with respect to appropriate continuous and discrete groups. 
Giovanni Luca Marchetti · Gabriele Cesa · Kumar Pratik · Arash Behboodi 🔗 


Opening Remarks
(
Opening Remarks
)
>

🔗 


Opening Remarks
(
Opening Remarks
)
>

🔗 