Universally Converging Representations of Matter Across Scientific Foundation Models
Abstract
Scientific foundation models have emerged across physics, chemistry, and biology, and all seek to learn rich features of matter relevant for downstream use cases. However, it remains unclear if these models with vastly different modalities, architectures, and training data are converging to a universal representation of matter. We extract and analyze embeddings of molecules, materials, and proteins from roughly 50 models, using three complementary alignment metrics to probe these learned representations. We find modest cross-modality alignment for molecules and materials, but strong alignment between protein models. Along with significant alignment within modalities, we thus find evidence of models converging to a single, optimal representation space. However, we also find that training data, rather than model architecture, is the dominant factor shaping latent spaces. In addition, materials models' embeddings of out-of-distribution structures align more than their embeddings of structures from within their training data distributions, suggesting that they remain data-limited and fall short of true foundation model status. Lastly, we observe that non-equivariant models trained to output conservative fields have high alignment with equivariant and invariant models, indicating the degree to which learned data features can mimic architectural inductive biases. Together, we propose representation alignment as a dynamic benchmark for foundation-level generalizability in scientific models, shaped by modality, data, and architecture.