Timezone: »
Theorem proving in natural mathematical language – the mixture of symbolic and natural language used by humans – plays a central role in mathematical advances and education, and tests aspects of reasoning that are core to intelligence. Yet it has remained underexplored with modern generative models. We study large-scale language models on two new generation tasks: suggesting the next step in a mathematical proof, and full proof generation. We develop NaturalProver, a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding. On theorems from the NaturalProofs benchmark, NaturalProver improves the quality of next-step suggestions and generated proofs over fine-tuned GPT-3, according to human evaluations from university-level mathematics students. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time, which is to our knowledge the first demonstration of these capabilities using neural language models.
Author Information
Sean Welleck (University of Washington)
Jiacheng Liu (Department of Computer Science, University of Washington)
Ximing Lu (Department of Computer Science, University of Washington)
Hannaneh Hajishirzi (University of Washington)
Yejin Choi (University of Washington)
More from the Same Authors
-
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 : Towards Grounded Natural Language Proof Generation »
Sean Welleck · Jiacheng Liu · Yejin Choi -
2022 : Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs »
Albert Jiang · Sean Welleck · Jin Peng Zhou · Timothee Lacroix · Jiacheng Liu · Wenda Li · Mateja Jamnik · Guillaume Lample · Yuhuai Wu -
2022 : Information-Theoretic Evaluation of Free-Text Rationales with Conditional $\mathcal{V}$-Information »
Hanjie Chen · Faeze Brahman · Xiang Ren · Yangfeng Ji · Yejin Choi · Swabha Swayamdipta -
2023 Poster: Localized Symbolic Knowledge Distillation for Visual Commonsense Models »
Jae Sung Park · Jack Hessel · Khyathi Chandu · Paul Pu Liang · Ximing Lu · Qiuyuan Huang · Peter West · Jianfeng Gao · Ali Farhadi · Yejin Choi -
2023 Poster: Self-Refine: Iterative Refinement with Self-Feedback »
Aman Madaan · Niket Tandon · Prakhar Gupta · Skyler Hallinan · Luyu Gao · Sarah Wiegreffe · Uri Alon · Nouha Dziri · Shrimai Prabhumoye · Yiming Yang · Shashank Gupta · Bodhisattwa Prasad Majumder · Katherine Hermann · Sean Welleck · Amir Yazdanbakhsh · Peter Clark -
2023 Poster: SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks »
Bill Yuchen Lin · Yicheng Fu · Karina Yang · Prithviraj (Raj) Ammanabrolu · Faeze Brahman · Shiyu Huang · Chandra Bhagavatula · Yejin Choi · Xiang Ren -
2023 Poster: Faith and Fate: Limits of Transformers on Compositionality »
Nouha Dziri · Ximing Lu · Melanie Sclar · Xiang (Lorraine) Li · Liwei Jiang · Bill Yuchen Lin · Sean Welleck · Peter West · Chandra Bhagavatula · Ronan Le Bras · Jena Hwang · Soumya Sanyal · Xiang Ren · Allyson Ettinger · Zaid Harchaoui · Yejin Choi -
2023 Poster: Fine-Grained Human Feedback Gives Better Rewards for Language Model Training »
Zeqiu Wu · Yushi Hu · Weijia Shi · Nouha Dziri · Alane Suhr · Prithviraj (Raj) Ammanabrolu · Noah Smith · Mari Ostendorf · Hannaneh Hajishirzi -
2023 Poster: How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources »
Yizhong Wang · Hamish Ivison · Pradeep Dasigi · Jack Hessel · Tushar Khot · Khyathi Chandu · David Wadden · Kelsey MacMillan · Noah Smith · Iz Beltagy · Hannaneh Hajishirzi -
2023 Poster: Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text »
Wanrong Zhu · Jack Hessel · Anas Awadalla · Samir Yitzhak Gadre · Jesse Dodge · Alex Fang · Youngjae Yu · Ludwig Schmidt · William Yang Wang · Yejin Choi -
2023 Poster: DataComp: In search of the next generation of multimodal datasets »
Samir Yitzhak Gadre · Gabriel Ilharco · Alex Fang · Jonathan Hayase · Georgios Smyrnis · Thao Nguyen · Ryan Marten · Mitchell Wortsman · Dhruba Ghosh · Jieyu Zhang · Eyal Orgad · Rahim Entezari · Giannis Daras · Sarah Pratt · Vivek Ramanujan · Yonatan Bitton · Kalyani Marathe · Stephen Mussmann · Richard Vencu · Mehdi Cherti · Ranjay Krishna · Pang Wei Koh · Olga Saukh · Alexander Ratner · Shuran Song · Hannaneh Hajishirzi · Ali Farhadi · Romain Beaumont · Sewoong Oh · Alex Dimakis · Jenia Jitsev · Yair Carmon · Vaishaal Shankar · Ludwig Schmidt -
2023 Poster: GenEval: An object-focused framework for evaluating text-to-image alignment »
Dhruba Ghosh · Hannaneh Hajishirzi · Ludwig Schmidt -
2023 Poster: RealTime QA: What's the Answer Right Now? »
Jungo Kasai · Keisuke Sakaguchi · yoichi takahashi · Ronan Le Bras · Akari Asai · Xinyan Yu · Dragomir Radev · Noah Smith · Yejin Choi · Kentaro Inui -
2023 Oral: DataComp: In search of the next generation of multimodal datasets »
Samir Yitzhak Gadre · Gabriel Ilharco · Alex Fang · Jonathan Hayase · Georgios Smyrnis · Thao Nguyen · Ryan Marten · Mitchell Wortsman · Dhruba Ghosh · Jieyu Zhang · Eyal Orgad · Rahim Entezari · Giannis Daras · Sarah Pratt · Vivek Ramanujan · Yonatan Bitton · Kalyani Marathe · Stephen Mussmann · Richard Vencu · Mehdi Cherti · Ranjay Krishna · Pang Wei Koh · Olga Saukh · Alexander Ratner · Shuran Song · Hannaneh Hajishirzi · Ali Farhadi · Romain Beaumont · Sewoong Oh · Alex Dimakis · Jenia Jitsev · Yair Carmon · Vaishaal Shankar · Ludwig Schmidt -
2023 Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI »
Zhenwen Liang · Albert Q. Jiang · Katie Collins · Pan Lu · Kaiyu Yang · Sean Welleck · James McClelland -
2023 Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics »
Sydney Levine · Liwei Jiang · Jared Moore · Zhijing Jin · Yejin Choi -
2022 Workshop: MATH-AI: Toward Human-Level Mathematical Reasoning »
Pan Lu · Swaroop Mishra · Sean Welleck · Yuhuai Wu · Hannaneh Hajishirzi · Percy Liang -
2022 Poster: Patching open-vocabulary models by interpolating weights »
Gabriel Ilharco · Mitchell Wortsman · Samir Yitzhak Gadre · Shuran Song · Hannaneh Hajishirzi · Simon Kornblith · Ali Farhadi · Ludwig Schmidt -
2022 Poster: COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics »
Lianhui Qin · Sean Welleck · Daniel Khashabi · Yejin Choi -
2022 Poster: QUARK: Controllable Text Generation with Reinforced Unlearning »
Ximing Lu · Sean Welleck · Jack Hessel · Liwei Jiang · Lianhui Qin · Peter West · Prithviraj Ammanabrolu · Yejin Choi -
2021 : Towards Grounded Natural Language Proof Generation »
Jiacheng Liu -
2021 : Poster Session 1 »
Jiaqi Chen · Tanglin Xia · Sean Welleck · Jiacheng Liu · Ran Gong · Shifeng Huang · Wei Yu · Tracy Jia Shen -
2021 Workshop: Math AI for Education (MATHAI4ED): Bridging the Gap Between Research and Smart Education »
Pan Lu · Yuhuai Wu · Sean Welleck · Xiaodan Liang · Eric Xing · James McClelland -
2021 : Panel Discussion »
Pascal Poupart · Ali Ghodsi · Luke Zettlemoyer · Sameer Singh · Kevin Duh · Yejin Choi · Lu Hou -
2021 : Battling with Larger Models through Grounding and Searching »
Yejin Choi -
2021 Oral: MERLOT: Multimodal Neural Script Knowledge Models »
Rowan Zellers · Ximing Lu · Jack Hessel · Youngjae Yu · Jae Sung Park · Jize Cao · Ali Farhadi · Yejin Choi -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 Poster: Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals »
Lang Liu · Krishna Pillutla · Sean Welleck · Sewoong Oh · Yejin Choi · Zaid Harchaoui -
2021 Poster: MERLOT: Multimodal Neural Script Knowledge Models »
Rowan Zellers · Ximing Lu · Jack Hessel · Youngjae Yu · Jae Sung Park · Jize Cao · Ali Farhadi · Yejin Choi -
2021 Poster: MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers »
Krishna Pillutla · Swabha Swayamdipta · Rowan Zellers · John Thickstun · Sean Welleck · Yejin Choi · Zaid Harchaoui -
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2021 Oral: MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers »
Krishna Pillutla · Swabha Swayamdipta · Rowan Zellers · John Thickstun · Sean Welleck · Yejin Choi · Zaid Harchaoui -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 : QA: Yejin Choi »
Yejin Choi -
2020 : Invited Talk: Yejin Choi »
Yejin Choi -
2020 : Adversarial, Socially Aware, and Commonsensical Data »
Yejin Choi -
2019 : Invited Talk (Yejin Choi) »
Yejin Choi -
2019 : Yejin Choi »
Yejin Choi -
2019 Poster: Defending Against Neural Fake News »
Rowan Zellers · Ari Holtzman · Hannah Rashkin · Yonatan Bisk · Ali Farhadi · Franziska Roesner · Yejin Choi -
2018 Poster: Loss Functions for Multiset Prediction »
Sean Welleck · Zixin Yao · Yu Gai · Jialin Mao · Zheng Zhang · Kyunghyun Cho -
2017 Poster: Saliency-based Sequential Image Attention with Multiset Prediction »
Sean Welleck · Jialin Mao · Kyunghyun Cho · Zheng Zhang