Curvature Estimation on Data Manifolds via Diffusion-augmented Sampling
Abstract
Data geometry is fundamental to machine learning and data analysis, yet practical tools for characterizing the geometry of data manifolds remain limited. While intrinsic dimension estimation is well-studied, curvature, a key measure of local manifold structure, is far harder to approximate from noisy, sparsely sampled data. We introduce a diffusion-based framework for curvature estimation aiming to mitigate challenges due to low sample density. We train a diffusion model to learn a latent representation of the manifold, which we then probe to augment the raw dataset and obtain a denser sample. Compared to state-of-the-art curvature estimators applied directly to the raw data, diffusion-augmented methods achieve superior performance on heterogeneous manifolds when using high-fidelity diffusion models.