AI for Automated Equation Discovery and Interpretation: Understanding Bio-Mining Dynamics in Chilean Copper Tailings
José Vásquez-Bastías · Maria Villarroel
Abstract
Mining activities generate tailings enriched with heavy metals, posing severe ecological and public health challenges. These extreme environments harbor unique microbial communities with specialized adaptations that remain poorly understood. We analyzed publicly available metabarcoding data from the Ovejer\'ia Tailings Dam (Chile) to model relationships between microbial abundances and metal concentrations. Using symbolic regression, we generated human-readable equations that predict genus-level abundances from environmental variables and, conversely, reconstruct environmental profiles from microbial composition. From 154 detected genera, 98 equations met a held-out filter requiring loss $\le 5\%$% of the target genus mean abundance under leave-one-site-out evaluation, indicating that approximately 64\% (98/154) of genus-level diversity can be reliably explained by environmental predictors. Variable-use analysis revealed consistent patterns, with available manganese and lead, and total chromium and molybdenum accounting for nearly 40\% of all predictor occurrences. To enhance interpretability, we paired SR outputs with large language models —GPT-2 as a general baseline and Qwen2.5-Math-7B as a math-specialized model—each provided with equation context, taxonomic description, and definitions of environmental variables. Explanations were evaluated using two criteria: C1, mathematical fidelity, and C2, biological plausibility and clarity. Qwen2.5-Math-7B consistently produced coherent, biologically meaningful interpretations linking mathematical structure to ecological mechanisms such as metal tolerance, nitrogen fixation, and potential pathogenicity, while GPT-2 often drifted off-topic. Together, these findings demonstrate an interpretable, scalable framework that combines symbolic regression and LLM-based reasoning to uncover hidden ecological patterns in metal-rich microbiomes. Beyond predictive performance, this approach contributes a transparent AI methodology for environmental monitoring and biotechnological innovation in mining-impacted ecosystems—a Latin American case study in explainable and responsible AI.
Successful Page Load