Skip to yearly menu bar Skip to main content

Workshop: MATH-AI: Toward Human-Level Mathematical Reasoning

Learn to Select Good Examples with Reinforcement Learning for Semi-structured Mathematical Reasoning

Pan Lu · Liang Qiu · Kai-Wei Chang · Ying Nian Wu · Song-Chun Zhu · Tanmay Rajpurohit · Peter Clark · Ashwin Kalyan


Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if models can handle more complex problems that involve heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain problems that require mathematical reasoning on both textual and tabular data, where each question is aligned with a tabular context. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. This issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select good in-context examples from a small amount of training data. Experimental results show that our method outperforms the best baseline by 5.31% in accuracy and reduces the prediction variance significantly compared to random selection.

Chat is not available.