NeurIPS ProofNet: A Benchmark for Autoformalizing and Formally Proving Undergraduate-Level Mathematics Problems

Poster
in
Workshop: MATH-AI: Toward Human-Level Mathematical Reasoning

ProofNet: A Benchmark for Autoformalizing and Formally Proving Undergraduate-Level Mathematics Problems

Zhangir Azerbayev · Bartosz Piotrowski · Jeremy Avigad

[ Abstract ]

[ Poster]

Abstract:

We introduce \textsf{ProofNet}, a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The \textsf{ProofNet} benchmarks consists of 297 theorem statements expressed in both natural language and the Lean 3 theorem prover, 100 of which are also accompanied by natural language proofs. The problems are primarily drawn from popular undergraduate pure mathematics textbooks, and cover topics such as real and complex analysis, linear algebra, abstract algebra, and topology. We intend for \textsf{ProofNet} to be a challenging benchmark that will drive progress in autoformalization and automatic theorem proving. We report baseline results on the autoformalization of statements using few-shot learning with large language models.

Chat is not available.

Poster in Workshop: MATH-AI: Toward Human-Level Mathematical Reasoning

ProofNet: A Benchmark for Autoformalizing and Formally Proving Undergraduate-Level Mathematics Problems

Zhangir Azerbayev · Bartosz Piotrowski · Jeremy Avigad

Poster
in
Workshop: MATH-AI: Toward Human-Level Mathematical Reasoning