Symbolic Regression via Order-Invariant Embeddings and Sparse Decoding
Krish Malik · Eric Reinhardt · Victor Baules · Nobuchika Okada · Sergei Gleyzer
Abstract
Symbolic regression can be a powerful tool in the physical sciences, where it is used to automatically discover governing equations and compact analytic laws from experimental or simulated data. We present a neural pipeline for symbolic regression that combines data sampling, structured tree representations of equations, specialized embeddings that capture relationships independent of input order, a sparse-attention sequence model, and constant refinement using Broyden–Fletcher–Goldfarb–Shanno algorithm optimization. Applied to the AI Feynman dataset, the system achieves strong numerical fidelity ($R^2 \approx 0.98$, RMSE $< 0.01$) and high token-level accuracy (99.6\%), but its ability to exactly recover full symbolic expressions remains limited ($\sim$20\%).
Chat is not available.
Successful Page Load