Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning
Kevin Lee · Russell Spiewak · James Walsh
Abstract
We present Reasoning With a Star, a new contribution for a heliophysics dataset applicable to LLM-reasoning, and provide an initial benchmarking approach. Our data are constructed from NASA/UCAR Living With a Star problem sets and manually compiled into a readily consumable question-and-answer structure with context, unit-aware numeric ground truth, metadata (expected type, required units, format hints), and acceptance checks. A programmatic grader grades answers via symbolic equivalence, unit-converted numeric tolerance, and schema rules. We provide a single-shot baseline and initial agent-based benchmarks, finding an improvement when tasks are broken down via patterns of agents.
Chat is not available.
Successful Page Load