Poster
in
Workshop: Machine Learning and the Physical Sciences

Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning

Kevin Lee ⋅ Russell Spiewak ⋅ James Walsh

Project Page [ Poster] [ OpenReview]

Abstract

We present Reasoning With a Star, a new contribution for a heliophysics dataset applicable to LLM-reasoning, and provide an initial benchmarking approach. Our data are constructed from NASA/UCAR Living With a Star problem sets and manually compiled into a readily consumable question-and-answer structure with context, unit-aware numeric ground truth, metadata (expected type, required units, format hints), and acceptance checks. A programmatic grader grades answers via symbolic equivalence, unit-converted numeric tolerance, and schema rules. We provide a single-shot baseline and initial agent-based benchmarks, finding an improvement when tasks are broken down via patterns of agents.

Chat is not available.