Measure Before You Look: Grounding Embeddings Through Manifold Metrics
Abstract
Dimensionality reduction methods are routinely employed across scientific disciplines to make high dimensional data amenable to analysis. Despite their widespread use, we often lack tools to assess whether their resulting embeddings are faithful to the underlying manifold structure.Without a rigorous quantitative assessment of an embedding's structural properties, it is difficult to quantify their degree of preservation or distortion of the underlying manifold structure of the data.We introduce a complementary suite of geometric metrics to quantitatively audit embedding fidelity across neighborhood sizes: Tangent Space Approximation (TSA), Local Intrinsic Dimensionality (LID), and Participation Ratio (PR).We compare the dimensionality of each sample before and after embedding, where points that preserve similar values across transformations are deemed to be geometrically faithful and thus, representative of true manifold structure in the data. Across synthetic and biological datasets, we show that these metrics expose distinct embedding failure modes: TSA is most sensitive to small-scale geometric distortions, LID captures heterogeneity in mixed-density regions, and PR diagnoses global variance structure.Finally, we demonstrate that applying Jacobian Frobenius penalties during autoencoder refinement of embeddings contracts tangent spaces, reduces disagreement between metrics, and improves alignment with intrinsic manifold geometry.We motivate moving beyond visual heuristics and making principled, geometry-based choices to inform method selection, improve representations and motivate geometry-aware objectives for representation learning.