Poster
in
Workshop: MATH-AI: The 5th Workshop on Mathematical Reasoning and AI

AntiderivBench: Evaluating language models on indefinite integration

Bartosz Piotrowski · Kaiyu Yang

Project Page [ OpenReview]

Abstract

We present AntiderivBench: a benchmark consisting of integration problems extracted from the challenging annual MIT Integration Bee competition. A number of frontier, closed models as well as smaller, open-source models are evaluated on it. Additionally, we create more challenging versions of the benchmark by symbolically manipulating the original competition problems. We envision that the benchmark will be useful for evaluating reasoning capabilities of LLMs and for experimenting with post-training LLM pipelines depending on verifiable rewards.

Chat is not available.