Spacier: A Dataset for Modeling Electrostatic Poisson–Boltzmann Atomic Solvation Potentials
Abstract
Electrostatic solvation free energy is central to biomolecular modeling, yet existing datasets for machine learning are limited in size, resolution, or scope. We introduce Spacier, a benchmark dataset for Poisson-Boltzmann (PB)-based electrostatics with atomic-level annotations across diverse molecular systems, from small molecules to large protein complexes. Unlike existing solvation datasets, Spacier emphasizes atomic precision in water while covering a broad spectrum of system sizes. Grounded in the PB equation, Spacier enables evaluation of both molecular learning methods and neural PDE solvers under standardized preprocessing, metrics, and loss functions. We further propose a charge-weighted regression objective that improves training stability by mitigating variance in atomic potentials. Baseline experiments with U-Net, Fourier Neural Operator, and graph neural network demonstrate competitive accuracy and scalability, but also reveal limitations in robustness and generalization. By framing solvation modeling as a physically grounded dataset task, Spacier provides a foundation for advancing machine learning and PDE-based methods in biomolecular electrostatics. The Spacier dataset is available at doi.org/10.5281/zenodo.15867553, and the source code for reproducing our experiments is accessible at github.com/yxwu21/PBGNN.