Towards unified representation learning and sampling for molecular sciences
Abstract
Both empirical experience and theory show that as dimensionality grows, the amount of data needed to achieve a given accuracy with generative sampling explodes, and high-dimensional generative models become data-hungry and slow to converge. In molecular systems, this burden is amplified by rare events, as we often do not know a priori which slow modes matter, how to see them or describe them. This motivates a unified approach that learns the right low-dimensional descriptions of the underlying physics and couples them directly to generative sampling in an end-to-end trainable pipeline. In this talk, I will present concrete realizations of this idea, coupling the State Predictive Information Bottleneck (SPIB) framework with different generative samplers. Applications to model potentials, proteins, Lennard–Jones clusters and RNA will illustrate how such unified representation learning and sampling can accelerate rare-event exploration and improve generalization across thermodynamic states and conditions outside the training data.