Mathematical reasoning is a unique aspect of human intelligence and a fundamental building block for scientific and intellectual pursuits. However, learning mathematics is often a challenging human endeavor that relies on expert instructors to create, teach and evaluate mathematical material. From an educational perspective, AI systems that aid in this process offer increased inclusion and accessibility, efficiency, and understanding of mathematics. Moreover, building systems capable of understanding, creating, and using mathematics offers a unique setting for studying reasoning in AI. This workshop will investigate the intersection of mathematics education and AI, including applications to teaching, evaluation, and assisting. Enabling these applications requires not only innovations in math AI research, but also a better understanding of the challenges in realworld education scenarios. Hence, we will bring together a group of experts from a diverse set of backgrounds, institutions, and disciplines to drive progress on these and other realworld education scenarios, and to discuss the promise and challenge of integrating mathematical AI into education.
Tue 8:55 a.m.  9:00 a.m.

Introduction and Opening Remarks
(Remarks)
SlidesLive Video » 
🔗 
Tue 9:00 a.m.  9:01 a.m.

Introduction of the talk speaker
(Introduction)

🔗 
Tue 9:01 a.m.  9:26 a.m.

Solving Math Problems by Joint Parsing and Cognitive Reasoning
(Invited Talk)
link »
SlidesLive Video » 
SongChun Zhu 🔗 
Tue 9:26 a.m.  9:30 a.m.

Talk Q&A
(Q&A)

🔗 
Tue 9:30 a.m.  9:31 a.m.

Introduction of the talk speaker
(Introduction)

🔗 
Tue 9:31 a.m.  9:56 a.m.

Natural Language Processing meets Educational Data Science
(Invited Talk)
SlidesLive Video » 
Mrinmaya Sachan 🔗 
Tue 9:56 a.m.  10:00 a.m.

Talk Q&A
(Q&A)

🔗 
Tue 10:00 a.m.  10:30 a.m.

Poster Session 1
(Poster Session)
link »
Please join us in GatherTown for our poster session. The posters are as follows: 33833 Geometric Question Answering Towards Multimodal Numerical Reasoning 33832 Towards Diagram Understanding and Cognitive Reasoning in Icon Question Answering 33830 Towards Grounded Natural Language Proof Generation 33828 TheoremAware Geometry Problem Solving with Symbolic Reasoning and Theorem Prediction 33827 REAL2: An endtoend memoryaugmented solver for math word problems 33826 GeoRE: A Relation Extraction Dataset for Chinese Geometry Problems 33823 MathBERT: A Pretrained Language Model for General NLP Tasks in Mathematics Education 
Jiaqi Chen · Tony Xia · Sean Welleck · Jiacheng (Gary) Liu · Ran Gong · Shifeng Huang · Wei Yu · Tracy Jia Shen 🔗 
Tue 10:30 a.m.  11:00 a.m.

Coffee Break
(Break)

🔗 
Tue 11:00 a.m.  12:00 p.m.

Interview with Stephen Wolfram
(Interview)
SlidesLive Video » 
Stephen Wolfram · Danielle R Mayer 🔗 
Tue 12:00 p.m.  1:00 p.m.

Lunch Break
(Break)

🔗 
Tue 1:00 p.m.  1:01 p.m.

Introduction of the talk speaker
(Introduction)

🔗 
Tue 1:01 p.m.  1:26 p.m.

Understanding and Knowledge Extraction from Mathematical and Scientific Text
(Invited Talk)
SlidesLive Video » 
Hanna Hajishirzi 🔗 
Tue 1:26 p.m.  1:30 p.m.

Talk Q&A
(Q&A)

🔗 
Tue 1:30 p.m.  1:31 p.m.

Introduction of the talk speaker
(Introduction)

🔗 
Tue 1:31 p.m.  1:56 p.m.

Freeform Grading of Math Assignments: A case study in collaboration with Art of Problem Solving
(Invited Talk)
SlidesLive Video » 
Yuri Burda 🔗 
Tue 1:56 p.m.  2:00 p.m.

Talk Q&A
(Q&A)

🔗 
Tue 2:00 p.m.  2:01 p.m.

Introduction of the talk speaker
(Introduction)

🔗 
Tue 2:01 p.m.  2:26 p.m.

FACT: An automated teaching assistant for middle school math classrooms
(Invited Talk)
SlidesLive Video » 
Kurt VanLehn 🔗 
Tue 2:26 p.m.  2:30 p.m.

Talk Q&A
(Q&A)

🔗 
Tue 2:30 p.m.  3:00 p.m.

Poster Session 2
(Poster Session)
link »
Please join us in GatherTown for our poster session. The posters are as follows: 33831 Gamifying Math Education using Object Detection 33829 Who Gets the Benefit of the Doubt? Racial Bias in Machine Learning Algorithms Applied to Secondary School Math Education 33825 Phygital Math Learning with Handwriting for Kids 33824 Exploring Student Representation For Neural Cognitive Diagnosis 33822 An Empirical Study of Finding Similar Exercises 33821 Evaluation of mathematical questioning strategies using data collected through weak supervision 
Yueqiu Sun · Haewon Jeong · Nrupatunga . · Hengyao Bao · Tongwen Huang · Debajyoti Datta 🔗 
Tue 3:00 p.m.  3:30 p.m.

Coffee Break
(Break)

🔗 
Tue 3:30 p.m.  3:31 p.m.

Introduction of the talk speaker
(Introduction)

🔗 
Tue 3:31 p.m.  3:56 p.m.

Weaving AI Into Education
(Invited Talk)
SlidesLive Video » 
Sumeet Singh 🔗 
Tue 3:56 p.m.  4:00 p.m.

Talk Q&A
(Q&A)

🔗 
Tue 4:00 p.m.  4:01 p.m.

Introduction of the contributed talk speaker
(Introduction)

🔗 
Tue 4:01 p.m.  4:16 p.m.

MathBERT: A Pretrained Language Model for General NLP Tasks in Mathematics Education
(Contributed Talk)
link »
SlidesLive Video » Best Paper Award for NeurIPS 2021 MathAI4Ed Workshop. Since the introduction of the original BERT (i.e., BASE BERT), researchers have developed various customized BERT models with improved performance for specific domains and tasks by exploiting the benefits of transfer learning. Due to the nature of mathematical texts, which often use domain specific vocabulary along with equations and math symbols, we posit that the development of a new BERT model for mathematics would be useful for many mathematical downstream tasks. In this paper, we introduce our multiinstitutional effort (i.e., two learning platforms and three academic institutions in the US) toward this need: MathBERT, a model created by pretraining the BASE BERT model on a large mathematical corpus ranging from prekindergarten (prek), to highschool, to college graduate level mathematical content. In addition, we select three general NLP tasks that are often used in mathematics education: prediction of knowledge component, autograding openended Q&A, and knowledge tracing, to demonstrate the superiority of m over BASE BERT. Our experiments show that MathBERT outperforms prior best methods by 1.222% and BASE BERT by 28% on these tasks. In addition, we build a mathematics specific vocabulary mathVocab to train with MathBERT. We release MathBERT for public usage at: https://github.com/tbs17/MathBERT. 
Tracy Jia Shen 🔗 
Tue 4:16 p.m.  4:20 p.m.

Contributed Talk Q&A
(Q&A)

🔗 
Tue 4:20 p.m.  4:21 p.m.

Introduction of the contributed talk speaker
(Introduction)

🔗 
Tue 4:21 p.m.  4:36 p.m.

Towards Grounded Natural Language Proof Generation
(Contributed Talk)
link »
SlidesLive Video » When a student is working on a mathematical proof, it is often helpful to receive suggestions about how to proceed. To this end, we provide an initial study of two generation tasks in natural mathematical language: suggesting the next step in a proof, and fullproof generation. As proofs are grounded in past results e.g. theorems, definitions we study knowledgegrounded generation methods, and find that conditioning on retrieved or groundtruth knowledge greatly improves generations. We characterize error types and provide directions for future research. 
Jiacheng Liu 🔗 
Tue 4:36 p.m.  4:40 p.m.

Contributed Talk Q&A
(Q&A)

🔗 
Tue 4:40 p.m.  5:00 p.m.

Coffee Break
(Break)

🔗 
Tue 5:00 p.m.  6:00 p.m.

Panel Discussion
(Panel)
SlidesLive Video » 
Jo Boaler · Yuri Burda · Chris Piech · Sumeet Singh · Kurt VanLehn 🔗 
Tue 6:00 p.m.  6:05 p.m.

Closing Remarks
(Remarks)

🔗 


Evaluation of mathematical questioning strategies using data collected through weak supervision
(Poster)
SlidesLive Video » Highfidelity, AIbased simulated classroom systems enable teachers to rehearse effective teaching strategies. However, a dialogue oriented open ended conversations like teaching a student about scale factor can be difficult to model. This paper presents a highfidelity, AI based classroom simulator to help teachers rehearse researchbased mathematical questioning skills. We take a human centered approach to designing our system relying advances in deeplearning, uncertainty quantification and natural language processing while acknowledging the limitations of conversational agents for specific pedagogical needs. Using experts' input directly during the simulation, we demonstrate how conversation success rate and high user satisfaction can be achieved. 
Debajyoti Datta · Maria Phillips · Jim P. Bywater · Jennifer L. Chiu · Ginger S. Watson · Laura E Barnes · Donald Brown 🔗 


An Empirical Study of Finding Similar Exercises
(Poster)
SlidesLive Video »
Education artificial intelligence aims to profit tasks in the education domain such as intelligent test paper generation and consolidation exercises where the main technique behind is how to match the exercises, known as the finding similar exercises(FSE) problem.
Most of these approaches emphasized their model abilities to represent the exercise, unfortunately there are still many challenges such as the scarcity of data, unsufficient understanding of exercises and high label noises. We release a Chinese education pretrained language model BERT$_{Edu}$ for the labelscarce dataset and introduce the exercise normalization to overcome the diversity of mathematical formulas and terms in exercise. We discover new auxiliary tasks in an innovative way depends on problemsolving ideas and propose a very effective MoE enhanced multitask model for FSE task to attain better understanding of exercises. In addition, confidence learning was utilized to prune trainset and overcome high noises in labeling data. Experiments show that these methods proposed in this paper are very effective.

Tongwen Huang · Li Xihua · Tongwen Huang 🔗 


MathBERT: A Pretrained Language Model for General NLP Tasks in Mathematics Education
(Poster)
SlidesLive Video » Since the introduction of the original BERT (i.e., BASE BERT), researchers have developed various customized BERT models with improved performance for specific domains and tasks by exploiting the benefits of transfer learning. Due to the nature of mathematical texts, which often use domain specific vocabulary along with equations and math symbols, we posit that the development of a new BERT model for mathematics would be useful for many mathematical downstream tasks. In this paper, we introduce our multiinstitutional effort (i.e., two learning platforms and three academic institutions in the US) toward this need: MathBERT, a model created by pretraining the BASE BERT model on a large mathematical corpus ranging from prekindergarten (prek), to highschool, to college graduate level mathematical content. In addition, we select three general NLP tasks that are often used in mathematics education: prediction of knowledge component, autograding openended Q&A, and knowledge tracing, to demonstrate the superiority of m over BASE BERT. Our experiments show that MathBERT outperforms prior best methods by 1.222% and BASE BERT by 28% on these tasks. In addition, we build a mathematics specific vocabulary mathVocab to train with MathBERT. We release MathBERT for public usage at: https://github.com/tbs17/MathBERT. 
Tracy Jia Shen · Michiharu Yamashita · Ethan Prihar · Neil Heffernan · Xintao Wu · Ben Graff · Dongwon Lee 🔗 


Exploring Student Representation For Neural Cognitive Diagnosis
(Poster)
SlidesLive Video » Cognitive diagnosis, the goal of which is to obtain the proficiency level of students on specific knowledge concepts, is an fundamental task in smart educational systems. Previous works usually represent each student as a trainable knowledge proficiency vector, which cannot capture the relations of concepts and the basic profile(e.g. memory or comprehension) of students. In this paper, we propose a method of student representation with the exploration of the hierarchical relations of knowledge concepts and student embedding. Specifically, since the proficiency on parent knowledge concepts reflects the correlation between knowledge concepts, we get the first knowledge proficiency with a parentchild concepts projection layer. In addition, a lowdimension dense vector is adopted as the embedding of each student, and obtain the second knowledge proficiency with a full connection layer. Then, we combine the two proficiency vector above to get the final representation of students. Experiments show the effectiveness of proposed representation method. 
Hengyao Bao · Li Xihua 🔗 


Phygital Math Learning with Handwriting for Kids
(Poster)
SlidesLive Video » To provide fun learning and concept apprehension for online education the content and experience are of prime importance. In this work, we present a Phygital (Physical + Digital) math learning through handwriting with traditional pen and paper, vital for a child's cognitive and motor skill development. Our system provides interactive educational content for 310 year old kids with realtime feedback and evaluation recognizing handwriting at high precision/ recall. The realtime feedback along with a virtual assisting character is developed in line with a child's thinking ability and age. Our system is used across geographies at a huge scale. 
Nrupatunga . · Aashish Kumar · Anoop Kolar Rajagopal 🔗 


GeoRE: A Relation Extraction Dataset for Chinese Geometry Problems
(Poster)
SlidesLive Video » Relation extraction is an important foundation for many natural language understanding applications, as well as geometry problem solving. In this paper, we present GeoRE, a relation extraction dataset for Chinese geometry problems. To the best of our knowledge, GeoRE is the first Chinese relation extraction dataset about geometry problems. It consists of 12,901 geometry problems on 43 shapes, covering 19 positional relations and 4 quantitative relations. We experiment with various stateoftheart (SOTA) models and the best model achieves only 70.3% F1 value on GeoRE. This shows that GeoRE presents a challenge for future research. 
Wei Yu · Shuyu Miao · Xun Zhou · Jingdong Liu · Yongfu Zha · Yongjian Zhang · Mengzhu Wang · Xiaodong Wang 🔗 


REAL2: An endtoend memoryaugmented solver for math word problems
(Poster)
SlidesLive Video » The task of math word problems has recently shown encouraging progress, e.g. in Recall and Learn (REAL), that solving problem by retrieving most similar questions based on a pretrained memory module. In this article, we verify the effectiveness of different neural memory modules that can be trained endtoend. Specifically, we first propose a TopN preranking process to retrieve candidate questions based on a Word2Vec model, and then we utilize a trainable memory module to rerank the candidates to obtain the most similar TopK questions. With this simple modification, we establish a stronger framework REAL2 that achieves stateoftheart results. Code will be made public and we hope it will make the research of analogical learning in MWP task more accessible. 
Shifeng Huang · Jiawei Wang · Jiao Xu · Da Cao · Ming Yang 🔗 


TheoremAware Geometry Problem Solving with Symbolic Reasoning and Theorem Prediction
(Poster)
SlidesLive Video » Geometry problem solving is challenging as it requires abstract problem understanding and symbolic reasoning with axiomatic knowledge. However, current datasets are either small in scale or not publicly available. Thus, we construct a new largescale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotation in formal language. We further propose a novel geometry solving approach with formal language and symbolic reasoning, called \textit{Interpretable Geometry Problem Solver} (InterGPS). InterGPS first parses the problem text and diagram into formal language automatically via rulebased text parsing and neural object detecting, respectively. Unlike implicit learning in existing methods, InterGPS incorporates theorem knowledge as conditional rules and performs symbolic reasoning step by step. Also, a theorem predictor is designed to infer the theorem application sequence fed to the symbolic solver for the more efficient and reasonable searching path. Extensive experiments on the Geometry3K and GEOS datasets demonstrate that InterGPS achieves significant improvements over existing methods. The project is available at https://lupantech.github.io/intergps. 
Pan Lu · Ran Gong · Shibiao Jiang · Liang Qiu · Siyuan Huang · Xiaodan Liang · SongChun Zhu · Ran Gong 🔗 


Who Gets the Benefit of the Doubt? Racial Bias in Machine Learning Algorithms Applied to Secondary School Math Education
(Poster)
SlidesLive Video » Machine learning algorithms are rapidly being adopted to aid pedagogical decisionmaking in applications ranging from grading to student placement. Are these algorithms fair? We prove that, for predicting students' math performance, the standard machine learning practice of selecting a model that maximizes predictive accuracy can result in algorithms that give significantly more benefit of the doubt to White, Asian students and are more punitive to Black, Hispanic, Native American students. This disparity is masked by comparatively high predictive accuracy across both groups. We suggest new interventions that help close this performance gap and do not require the use of a different algorithm for each student group. Together, our results suggest new best practices for applying machine learning to educationrelated applications. 
Haewon Jeong · Michael D. Wu · Nilanjana Dasgupta · Muriel Medard · Flavio Calmon 🔗 


Towards Grounded Natural Language Proof Generation
(Poster)
SlidesLive Video » When a student is working on a mathematical proof, it is often helpful to receive suggestions about how to proceed. To this end, we provide an initial study of two generation tasks in natural mathematical language: suggesting the next step in a proof, and fullproof generation. As proofs are grounded in past results e.g. theorems, definitions we study knowledgegrounded generation methods, and find that conditioning on retrieved or groundtruth knowledge greatly improves generations. We characterize error types and provide directions for future research. 
Sean Welleck · Jiacheng (Gary) Liu · Yejin Choi 🔗 


Gamifying Math Education using Object Detection
(Poster)
SlidesLive Video » Manipulatives used in the right way help improve mathematical concepts leading to better learning outcomes. In this paper, we present a phygital (physical + digital) curriculum inspired teaching system for kids aged 58 to learn geometry using shape tile manipulatives. Combining smaller shapes to form larger ones is an important skill kids learn early on which requires shape tiles to be placed close to each other in the play area. This introduces a challenge of oriented object detection for densely packed objects with arbitrary orientations. Leveraging simulated data for neural network training and lightweight mobile architectures, we enable our system to understand user interactions and provide realtime audiovisual feedback. Experimental results show that our network runs realtime with high precision/recall on consumer devices, thereby providing a consistent and enjoyable learning experience. 
Rohit Nambiar · Yueqiu Sun · Vivek Vidyasagaran 🔗 


Towards Diagram Understanding and Cognitive Reasoning in Icon Question Answering
(Poster)
SlidesLive Video » Current visual question answering (VQA) tasks mainly consider answering humanannotated questions for natural images. However, aside from natural images, abstract diagrams with semantic richness are still understudied in visual understanding and reasoning research. In this work, we introduce a new challenge of Icon Question Answering (IconQA) with the goal of answering a question in an icon image context. We release IconQA, a largescale dataset that consists of 107,439 questions, which highlights the importance of abstract diagram understanding and comprehensive cognitive reasoning. IconQA requires not only perception skills like object recognition and text understanding, but also diverse cognitive reasoning skills, such as geometric reasoning, commonsense reasoning, and arithmetic reasoning. To facilitate potential IconQA models to learn semantic representations for icon images, we further release an icon dataset Icon645 which contains 645,687 colored icons on 377 classes. We conduct extensive user studies and blind experiments and reproduce a wide range of advanced VQA methods to benchmark the IconQA task. Also, we develop a strong IconQA baseline PatchTRM that applies a pyramid crossmodal Transformer with input diagram embeddings pretrained on the icon dataset. IconQA and Icon645 are available athttps://iconqa.github.io. 
Pan Lu · Liang Qiu · Jiaqi Chen · Tanglin Xia · Yizhou Zhao · Wei Zhang · Zhou Yu · Xiaodan Liang · SongChun Zhu 🔗 


Geometric Question Answering Towards Multimodal Numerical Reasoning
(Poster)
SlidesLive Video » Automatic math problem solving has recently attracted increasing attention as a longstanding AI benchmark. In this paper, we focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge. However, the existing methods were highly dependent on handcraft rules and were merely evaluated on smallscale datasets. Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems. Compared with another publicly available dataset GeoS, GeoQA is 25 times larger, in which the program annotations can provide a practical testbed for future research on explicit and explainable numerical reasoning. Moreover, we introduce a Neural Geometric Solver (NGS) to address geometric problems by comprehensively parsing multimodal information and generating interpretable programs. We further add multiple selfsupervised auxiliary tasks on NGS to enhance crossmodal semantic representation. Extensive experiments on GeoQA validate the effectiveness of our proposed NGS and auxiliary tasks. However, the results are still significantly lower than human performance, which leaves large room for future research. 
Jiaqi Chen · Jianheng Tang · Jinghui Qin · Xiaodan Liang · Lingbo Liu · Eric Xing · Liang Lin 🔗 