Does LLM dream of differential equation discovery?
Abstract
Large Language Models (LLMs) have shown promise in symbolic regression tasks, yet their application to partial differential equation (PDE) discovery faces fundamental challenges. Unlike traditional symbolic regression, where models directly generate data enabling fast feedback, PDE discovery requires solving implicit equations and extracting derivative information from physical field data, capacities that current LLMs lack out of the box. We address these challenges through three key contributions: (1) reformulating PDE discovery as a code generation task that leverages LLMs' programming capabilities, (2) developing an optimal data representation format that preserves physical field properties while fitting within context limitations and enabling derivative extraction, and (3) integrating LLMs into a meta-learning framework with the EPDE (Evolutionary Partial Differential Equation) algorithm, where LLMs serve as informed oracles suggesting physically plausible equation forms. Our approach bridges the gap between LLMs' theoretical knowledge of differential equations and the practical requirements of scientific discovery from data. We demonstrate that properly formatted physical field data combined with code generation prompts enables general-purpose LLMs to participate meaningfully in the equation discovery process, despite not being specifically trained for this task. This work establishes a foundation for leveraging pre-trained LLMs in automated scientific discovery while acknowledging current limitations and the need for hybrid human-AI validation approaches.