AntiderivBench: Evaluating language models on indefinite integration
Bartosz Piotrowski · Kaiyu Yang
Abstract
We present AntiderivBench: a benchmark consisting of integration problems extracted from the challenging annual MIT Integration Bee competition. A number of frontier, closed models as well as smaller, open-source models are evaluated on it. Additionally, we create more challenging versions of the benchmark by symbolically manipulating the original competition problems. We envision that the benchmark will be useful for evaluating reasoning capabilities of LLMs and for experimenting with post-training LLM pipelines depending on verifiable rewards.
Chat is not available.
Successful Page Load