Timezone: »
Mathematical reasoning is a unique aspect of human intelligence and a fundamental building block for scientific and intellectual pursuits. However, learning mathematics is often a challenging human endeavor that relies on expert instructors to create, teach and evaluate mathematical material. From an educational perspective, AI systems that aid in this process offer increased inclusion and accessibility, efficiency, and understanding of mathematics. Moreover, building systems capable of understanding, creating, and using mathematics offers a unique setting for studying reasoning in AI. This workshop will investigate the intersection of mathematics education and AI.
Sat 6:55 a.m. - 7:00 a.m.
|
Introduction and Opening Remarks
(
Opening Remarks
)
SlidesLive Video » |
🔗 |
Sat 7:00 a.m. - 7:30 a.m.
|
Reasoning and Abstraction as Challenges for AI
(
Invited Talk
)
SlidesLive Video » |
Cezary Kaliszyk 🔗 |
Sat 7:30 a.m. - 8:00 a.m.
|
Length Generalization in Quantitative Reasoning
(
Invited Talk
)
SlidesLive Video » |
Behnam Neyshabur 🔗 |
Sat 8:00 a.m. - 8:30 a.m.
|
Has Progress on Math been Surprising?
(
Invited Talk
)
SlidesLive Video » In 2021, we commissioned forecasters to predict progress on ML benchmarks, including the MATH dataset for mathematical problem-solving. Progress on MATH ended up being much faster than predicted. I'll discuss what we should and shouldn't take away from this, my own predictions for future progress, and general implications for predicting future developments in ML. |
Jacob Steinhardt 🔗 |
Sat 8:30 a.m. - 10:00 a.m.
|
Poster Session
|
🔗 |
Sat 10:00 a.m. - 11:00 a.m.
|
Lunch Break
|
🔗 |
Sat 11:00 a.m. - 11:20 a.m.
|
Teaching Algorithmic Reasoning via In-context Learning
(
Contributed Talk
)
link »
SlidesLive Video » Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size. Despite this progress, LLMs are still unable to solve algorithmic reasoning problems. While providing a rationale with the final answer has led to further improvements in multi-step reasoning problems, Anil et al. (2022) showed that even simple algorithmic reasoning tasks such as parity are far from solved. In this work, we identify and study four key stages for successfully teaching algorithmic reasoning to LLMs: (1) formulating algorithms as skills, (2) teaching multiple skills simultaneously (skill accumulation), (3) teaching how to combine skills (skill composition) and (4) teaching how to use skills as tools. We show that it is possible to teach algorithmic reasoning to LLMs via in-context learning, which we refer to as algorithmic prompting. We evaluate our approach on a variety of arithmetic and quantitative reasoning tasks, and demonstrate significant boosts in performance over existing prompting techniques. In particular, for long parity, addition, multiplication and subtraction, we achieve an error reduction of approximately 10x, 9x, 5x and 2x respectively compared to the best available baselines. |
Hattie Zhou · Azade Nova · aaron courville · Hugo Larochelle · Behnam Neyshabur · Hanie Sedghi 🔗 |
Sat 11:20 a.m. - 11:40 a.m.
|
Solving Math Word Problems with Process-based and Outcome-based Feedback
(
Contributed Talk
)
link »
SlidesLive Video » Recent work has shown that prompting language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise the finetuning of such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use process-based supervision or supervision from learned reward models that emulate process-based feedback. In total, we improve the previous best results from 16.8% → 12.7% final-answer error and from 14.0% → 3.4% reasoning error among final-answer-correct solutions. |
Jonathan Uesato · Nate Kushman · Ramana Kumar · H. Francis Song · Noah Siegel · Lisa Wang · Antonia Creswell · Geoffrey Irving · Irina Higgins 🔗 |
Sat 11:40 a.m. - 12:00 p.m.
|
ProofNet: A Benchmark for Autoformalizing and Formally Proving Undergraduate-Level Mathematics Problems
(
Contributed Talk
)
SlidesLive Video » |
Zhangir Azerbayev · Bartosz Piotrowski · Jeremy Avigad 🔗 |
Sat 12:00 p.m. - 12:30 p.m.
|
Towards Systematic Reasoning with Language Models
(
Invited Talk
)
SlidesLive Video » Mathematics requires systematic reasoning, namely the step-wise application of knowledge in a sound manner to reach a conclusion. Can language models (LMs) perform this kind of systematic reasoning with knowledge provided to it? Or, even more ambitiously, can LMs reason systematically with their own internal knowledge acquired during pretraining? In this talk, I'll attempt to answer these questions, illustrated with our recent work on using LMs for logical deduction, proof generation, and multistep textual entailment problems. While progress has been made, there is still a way to go. To illustrate this, I'll conclude by posing a (currently unsolved) grand challenge - answering Fermi problems - to the math reasoning community, requiring combining systematic reasoning, mathematics, and world knowledge together. |
Peter Clark 🔗 |
Sat 12:30 p.m. - 1:00 p.m.
|
Coffee Break
|
🔗 |
Sat 1:00 p.m. - 1:30 p.m.
|
Leveraging Maths to Understand Transformers
(
Invited Talk
)
SlidesLive Video » |
Francois Charton 🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
Learning Mathematical Reasoning for Education
(
Invited Talk
)
SlidesLive Video » |
Noah Goodman 🔗 |
Sat 2:00 p.m. - 2:55 p.m.
|
MATH-AI: Toward Human-Level Mathematical Reasoning
(
Discussion Panel
)
SlidesLive Video » |
Francois Charton · Noah Goodman · Behnam Neyshabur · Talia Ringer · Daniel Selsam 🔗 |
Sat 2:55 p.m. - 3:00 p.m.
|
Closing Remarks
|
🔗 |
-
|
Neural Combinatorial Logic Circuit Synthesis from Input-Output Examples
(
Poster
)
We propose a novel, fully explainable neural approach to synthesis of combinatorial logic circuits from input-output examples. The carrying advantage of our method is that it readily extends to inductive scenarios, where the set of examples is incomplete but still indicative of the desired behaviour. Our method can be employed for a virtually arbitrary choice of atoms - from logic gates to FPGA blocks - as long as they can be formulated in a differentiable fashion, and consistently yields good results for synthesis of practical circuits of increasing size. In particular, we succeed in learning a number of arithmetic, bitwise, and signal-routing operations, and even generalise towards the correct behaviour in inductive scenarios. Our method, attacking a discrete logical synthesis problem with an explainable neural approach, hints at a wider promise for synthesis and reasoning-related tasks. |
Peter Belcak · Roger Wattenhofer 🔗 |
-
|
Automatic Generation of Socratic Questions for Learning to Solve Math Word Problems
(
Poster
)
SlidesLive Video » Socratic questioning is an educational method that allows students to discover answers to complex problems by asking them a series of thoughtful questions. Generation of didactically sound questions is challenging, requiring an understanding of the reasoning process involved in the problem. We hypothesize that such a questioning strategy can not only enhance human performance but also assist the math word problem (MWP) solvers.In this work, we explore the ability of large language models (LMs) in generating sequential questions for guiding math word problem-solving. We propose various guided question generation schemes based on input conditioning and reinforcement learning.On both automatic and human quality evaluations, we find that LMs constrained with desirable question properties generate superior questions and improve the overall performance of a math word problem solver. |
Kumar Shridhar · Jakub Macina · Menna El-Assady · tanmay sinha · Mrinmaya Sachan 🔗 |
-
|
Generating Reflexive Polytopes via Sequence Modeling
(
Poster
)
SlidesLive Video » We train neural network sequence models to generate reflexive lattice polytopes. We demonstrate that they can generate mathematical objects satisfying various geometric properties. We use the completeness of our datasets to give evidence that the models are understanding some underlying structure of the data. |
Bernt Ivar Utstøl Nødland 🔗 |
-
|
A Causal Framework to Quantify Robustness of Mathematical Reasoning with Language Models
(
Poster
)
SlidesLive Video » We have recently witnessed a number of impressive results on hard mathematical reasoning problems with large language models (LLMs). At the same time, the robustness of these models has also been called into question.Building on the idea of behavioral testing, we propose a novel framework, which pins down the causal effect of each factor in the input, e.g., the surface form of the problem text, the operands, and math operators, on the output. By grounding the behavioral analysis in a causal graph describing an intuitive reasoning process, we study the behavior of LLMs in terms of robustness and sensitivity to direct interventions in the input space. We apply our framework on a test bed of bivariate math word problems.Our analysis shows that robustness does not appear to continuously improve as a function of scale, but that the recent LLM, GPT-3-Instruct (175B), achieves a dramatic improvement in both robustness and sensitivity, compared to all other GPT variants. |
Alessandro Stolfo · Zhijing Jin · Kumar Shridhar · Bernhard Schölkopf · Mrinmaya Sachan 🔗 |
-
|
What is my math transformer doing? Three results on interpretability and generalization
(
Poster
)
We investigate the failure cases and out-of-distribution behavior of transformers trained on matrix inversion, eigen decomposition and eigenvalue calculation. We show that incorrect model predictions still retain deep mathematical properties of the solution (e.g. correct eigenvalues, unit norm of eigenvectors), and that almost all model failures can be attributed to, and predicted from, properties of the problem or solution. This demonstrates that, when in doubt, math transformers do not hallucinate crazy solutions (as was sometimes proposed) but remain |
Francois Charton 🔗 |
-
|
Learning to Understand Plane Geometry Diagram
(
Poster
)
SlidesLive Video » Geometry diagram parsing plays a key role in geometry problem solving, wherein the primitive extraction and relation parsing remain challenging due to the complex layout and between-primitive relationship. In this paper, we propose a powerful diagram parser based on deep learning and graph reasoning. Specifically, a modified instance segmentation method is proposed to extract geometric primitives, and the graph neural network (GNN) is leveraged to realize relation parsing and primitive classification incorporating geometric features and prior knowledge. All the modules are integrated into an end-to-end model called PGDPNet to perform all the sub-tasks simultaneously. In addition, we build a new large-scale geometry diagram dataset named PGDP5K with primitive level annotations. Experiments on PGDP5K and an existing dataset IMP-Geometry3K show that our model outperforms state-of-the-art methods in four sub-tasks remarkably. The full version of this paper has been accepted by IJCAI 2022. |
Mlingliang Zhang · Fei yin · Yihan Hao · Cheng-lin Liu 🔗 |
-
|
Lemma: Bootstrapping High-Level Mathematical Reasoning with Learned Symbolic Abstractions
(
Poster
)
SlidesLive Video » Humans tame the complexity of mathematical reasoning by developing hierarchies of abstractions.With proper abstractions, solutions to hard problems can be expressed concisely, thus making them more likely to be found.In this paper, we propose Learning Mathematical Abstractions (LEMMA): an algorithm that implements this idea forreinforcement learning agents in mathematical domains.LEMMA augments Expert Iterationwith an abstraction step, where solutions found so far are revisitedand rewritten in terms of new higher-level actions, which thenbecome available to solve new problems.We evaluate LEMMA on two mathematicalreasoning tasks--equation solving and fraction simplification--ina step-by-step fashion.In these two domains,LEMMA improves the ability of an existing agent, bothsolving more problems and generalizing more effectively to harderproblems than those seen during training. |
Zhening Li · Gabriel Poesia Reis e Silva · Omar Costilla Reyes · Noah Goodman · Armando Solar-Lezama 🔗 |
-
|
MWP-BERT: A Numeracy-augmented Pre-trained Encoder for Math Word Problems
(
Poster
)
SlidesLive Video » Math word problem (MWP) solving faces a dilemma in number representation learning. In order to avoid the number representation issue and reduce the search space of feasible solutions, existing works striving for MWP solving usually replace real numbers with symbolic placeholders to focus on logic reasoning. However, instead of the number value itself, it is the reusable numerical property that matters more in numerical reasoning. Therefore, we argue that injecting numerical properties into symbolic placeholders with contextualized representation learning schema canprovide a way out of the dilemma in the number representation issue here. In this work, we introduce this idea to the popular pre-training language model (PLM) techniques and build MWP-BERT, an effective contextual number representation PLM. We demonstrate the effectiveness of our MWP-BERT on MWP solving and several MWP-specific understanding tasks on both English and Chinese benchmarks. |
Zhenwen Liang · Jipeng ZHANG · Lei Wang · Wei QIN · Jie Shao · Xiangliang Zhang 🔗 |
-
|
Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems
(
Poster
)
SlidesLive Video » Recent language models have struggled to generalize to a large range of numbers in numerical reasoning.In this paper, we propose a novel method that leverages simple numbers as anchors to characterize the implicitly inferred arithmetic expressions from language models, and then explicitly applies the expressions to original numbers to get the answers.Experimental results on several numerical reasoning benchmarks demonstrate that our approach is highly effective.More importantly, our approach works in the inference phase without extra model training, making it highly portable and achieving significant and consistent performance benefits across a variety of language models in zero-shot, few-shot, and fine-tuning scenarios. |
Fan Zhou · Haoyu Dong · Qian Liu · Zhoujun Cheng · Shi Han · Dongmei Zhang 🔗 |
-
|
EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry
(
Poster
)
SlidesLive Video » In this paper, we present a deep learning-based framework for solving geometric construction problems through visual reasoning, which is useful for automated geometry theorem proving. Constructible problems in geometry often ask for the sequence of straightedge-and-compass constructions to construct a given goal given some initial setup. Our EuclidNet framework leverages the neural network architecture Mask R-CNN to extract the visual features from the initial setup and goal configuration with extra points of intersection, and then generate possible construction steps as intermediary data models that are used as feedback in the training process for further refinement of the construction step sequence. This process is repeated recursively until either a solution is found, in which case we backtrack the path for a step-by-step construction guide, or the problem is identified as unsolvable. Our EuclidNet framework is validated on complex Japanese Sangaku geometry problems, demonstrating its capacity to leverage backtracking for deep visual reasoning of challenging problems. |
Man Fai Wong · Xintong Qi · Chee-Wei Tan 🔗 |
-
|
Estimating Numbers without Regression
(
Poster
)
SlidesLive Video » Despite recent successes in language models, their ability to represent numbers is insufficient. Humans conceptualize numbers based on their magnitudes, effectively projecting them on a number line; whereas subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks. To alleviate this shortcoming, alternative approaches have been proposed that modify numbers at various stages of the language modeling pipeline. These methods change either the (1) notation in which numbers are written (eg scientific vs decimal), the (2) vocabulary used to represent numbers or the entire (3) architecture of the underlying language model, to directly regress to a desired number. In this work, we show that a potential trade-off to the more complex architectural changes is to simply change the model's vocabulary instead, \eg introduce a new token for numbers in range 10-100. In the context of masked number prediction, we find that a carefully designed tokenization scheme is both the simplest to implement and sufficient, i.e., with similar performance to the state-of-the-art approach that requires making significant architectural changes.Finally, we evaluate the various number representation schemes on the downstream task of numerical fact estimation (for Fermi Problems) in a zero-shot setting and find similar trends, i.e., changes at the tokenization level achieve near state-of-the-art results while requiring minimal resources compared to other number representation schemes. |
Avijit Thawani · Jay Pujara · Ashwin Kalyan 🔗 |
-
|
Learn to Select Good Examples with Reinforcement Learning for Semi-structured Mathematical Reasoning
(
Poster
)
Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if models can handle more complex problems that involve heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain problems that require mathematical reasoning on both textual and tabular data, where each question is aligned with a tabular context. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. This issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select good in-context examples from a small amount of training data. Experimental results show that our method outperforms the best baseline by 5.31% in accuracy and reduces the prediction variance significantly compared to random selection. |
Pan Lu · Liang Qiu · Kai-Wei Chang · Ying Nian Wu · Song-Chun Zhu · Tanmay Rajpurohit · Peter Clark · Ashwin Kalyan 🔗 |
-
|
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
(
Poster
)
The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems. We investigate two relevant setups where informal proofs are either written by humans or generated by a language model. Our experiments and ablation studies show that large language models are able to produce well-structured formal sketches that follow the same reasoning steps as the informal proofs. Guiding an automated prover with these sketches enhances its performance from 20.9% to 39.3% on a collection of mathematical competition problems. |
Albert Jiang · Sean Welleck · Jin Peng Zhou · Timothee Lacroix · Jiacheng Liu · Wenda Li · Mateja Jamnik · Guillaume Lample · Yuhuai Wu 🔗 |
-
|
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic
(
Poster
)
SlidesLive Video » Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or strict mathematical reasoning. However, as we illustrate in this paper, building a general purpose language model that also happens to be proficient in mathematical reasoning is not as straight-forward as training it on a numeric dataset. In this work, we develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models. |
Mandar Sharma · Nikhil Muralidhar · Naren Ramakrishnan 🔗 |
-
|
Teaching Algorithmic Reasoning via In-context Learning
(
Poster
)
SlidesLive Video » Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size. Despite this progress, LLMs are still unable to solve algorithmic reasoning problems. While providing a rationale with the final answer has led to further improvements in multi-step reasoning problems, Anil et al. 2022 showed that even simple algorithmic reasoning tasks such as parity are far from solved. In this work, we identify and study four key stages for successfully teaching algorithmic reasoning to LLMs: (1) formulating algorithms as skills, (2) teaching multiple skills simultaneously (skill accumulation), (3) teaching how to combine skills (skill composition) and (4) teaching how to use skills as tools. We show that it is possible to teach algorithmic reasoning to LLMs via in-context learning, which we refer to as \emph{algorithmic prompting}. We evaluate our approach on a variety of arithmetic and quantitative reasoning tasks, and demonstrate significant boosts in performance over existing prompting techniques. In particular, for long parity, addition, multiplication and subtraction, we achieve an error reduction of approximately 10x, 9x, 5x and 2x respectively compared to the best available baselines. |
Hattie Zhou · Azade Nova · aaron courville · Hugo Larochelle · Behnam Neyshabur · Hanie Sedghi 🔗 |
-
|
Broken Neural Scaling Laws
(
Poster
)
We present a smoothly broken power law functional form that accurately models the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, or training dataset size varies) for each task within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision and unsupervised language tasks, arithmetic, and reinforcement learning. This functional form yields extrapolations of scaling behavior that often are an order of magnitude more accurate than the ones obtained by other functional forms for neural scaling behavior. Moreover, this functional form accurately models the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. |
Ethan Caballero · Kshitij Gupta · Irina Rish · David Krueger 🔗 |
-
|
Towards automating formalisation of theorem statements using large language models
(
Poster
)
SlidesLive Video » Mathematics formalisation is the task of writing mathematics (i.e., definitions, theorem statements, proofs) in natural language, as found in books and papers, into a formal language that can then be checked for correctness by a program. It is a thriving activity today, however formalisation remains cumbersome. In this paper, we explore the abilities of a large language model (Codex) to help with formalisation in the Lean theorem prover. We find that with careful input-dependent prompt selection and postprocessing, Codex is able to formalise short mathematical statements at undergrad level with nearly 75% accuracy for 120 theorem statements. |
Siddhartha Gadgil · Anand Tadipatri · Navin Goyal · Ayush Agrawal · Ashvni Narayanan 🔗 |
-
|
Graph neural networks for Ramsey graphs
(
Poster
)
SlidesLive Video » Ramsey-like problems are ubiquitous in extremal combinatorics and occupy a central place in the field. In simple terms, Ramsey theory wishes to find the minimum size of a large graph structure such that some sought substructure - generally a clique or an independent set - is guaranteed to exist. Due to considerations of computational complexity, brute force approaches to solving these problems are usually not very feasible, as the substructures cannot be checked in polynomial time. At the same time, we seek extremal graphs that completely avoid such substructures to better understand the graph theory governing their occurrence. We investigate the feasibility of Graph Neural Networks (GNNs) in terms of indicating and refining search procedures for finding these special classes of Ramsey-extremal graphs, which are of interest to mathematicians. |
Amur Ghose · Amit Levi · Yingxueff Zhang 🔗 |
-
|
Improving Compositional Generalization in Math Word Problem Solving
(
Poster
)
SlidesLive Video » Compositional generalization refers to a model's capability to generalize to newly composed input data based on the data components observed during training. It has triggered a series of compositional generalization analysis on different tasks as generalization is an important aspect of language and problem solving skills. However, the similar discussion on math word problems (MWPs) is limited. In this manuscript, we study compositional generalization in MWP solving. Specifically, we first introduce a data splitting method to create compositional splits from existing MWP datasets. Meanwhile, we synthesize data to isolate the effect of compositions. To improve the compositional generalization in MWP solving, we propose an iterative data augmentation method that includes diverse compositional variation into training data and could collaborate with MWP methods. During the evaluation, we examine a set of methods and find all of them encounter severe performance loss on the evaluated datasets. We also find our data augmentation method could significantly improve the compositional generalization of general MWP methods. |
Yunshi Lan · Lei Wang · Jing Jiang · Ee-peng Lim 🔗 |
-
|
ProofNet: A Benchmark for Autoformalizing and Formally Proving Undergraduate-Level Mathematics Problems
(
Poster
)
We introduce \textsf{ProofNet}, a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The \textsf{ProofNet} benchmarks consists of 297 theorem statements expressed in both natural language and the Lean 3 theorem prover, 100 of which are also accompanied by natural language proofs. The problems are primarily drawn from popular undergraduate pure mathematics textbooks, and cover topics such as real and complex analysis, linear algebra, abstract algebra, and topology. We intend for \textsf{ProofNet} to be a challenging benchmark that will drive progress in autoformalization and automatic theorem proving. We report baseline results on the autoformalization of statements using few-shot learning with large language models. |
Zhangir Azerbayev · Bartosz Piotrowski · Jeremy Avigad 🔗 |
-
|
Learning to Reason With Relational Abstractions
(
Poster
)
Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning. |
Andrew Nam · James McClelland · Mengye Ren · Chelsea Finn 🔗 |
-
|
Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning
(
Poster
)
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks, and is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules can solve problems independently of the particular values of the variables. Large transformer-based language models have pushed the boundaries on how well neural networks can generalize to novel inputs, but their complexity obfuscates they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in smaller scale transformers. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks. |
Andrew Nam · Mustafa Abdool · Trevor Maxfield · James McClelland 🔗 |
-
|
On the Abilities of Mathematical Extrapolation with Implicit Models
(
Poster
)
Deep neural networks excel on a variety of different tasks, often surpassing human intelligence. However, when presented with out-of-distribution data, these models tend to break down even on the simplest tasks. In this paper, we compare implicitly-defined and classical deep learning models on a series of mathematical extrapolation tasks, where the models are tested with out-of-distribution samples during inference time. Throughout our experiments, implicit models greatly outperform classical deep learning networks that overfit the training distribution. We showcase implicit models' unique advantages for extrapolation thanks to their flexible and selective framework. Thanks to their potentially unlimited depth, implicit models not only adapt well to out-of-distribution inputs but also understand the underlying structure of inputs much better. |
Alicia Tsai · Juliette Decugis · Ashwin Ganesh · Max Emerling · Laurent El Ghaoui 🔗 |
-
|
Program Synthesis for Integer Sequence Generation
(
Poster
)
SlidesLive Video » Recent advances in program synthesis have shown success with methods that employ deep learning on synthetic data generated from domain specific languages (DSLs). In this work, we propose an algorithm for program synthesis that extends these methods. It uses transfer learning from pre-trained language models, and employs a policy improvement operator based on policy-guided search. This hybrid approach combats the challenges of searching a large language space with sparse rewards. We show its effectiveness on the task of integer sequence generation, a special case of programming-by-examples with fixed inputs. Our preliminary results demonstrate that the inclusion of policy-guided search leads to a 1.6% increase in the number of correct programs compared to supervised baselines. |
Natasha Butt · Auke Wiggers · Taco Cohen · Max Welling 🔗 |
-
|
LILA: A Unified Benchmark for Mathematical Reasoning
(
Poster
)
SlidesLive Video » Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g arithmetic, calculus, (ii) language format e.g. question-answering, fill-in-the-blanks, (iii) language diversity e.g. no language, simple language, (iv) external knowledge e.g. commonsense, physics. We construct our benchmark by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs, thereby obtaining explainable solutions in addition to the correct answer. We introduce two evaluation datasets to measure out-of-distribution performance and robustness to language perturbation. Finally, we introduce BHASKARA and its variants, a family of mathematical reasoning models fine-tuned on LILA. Importantly, we find that multi-tasking leads to significant improvements (average relative improvement of 21.83% F1 score vs single-task models), while the best performing model only obtains 60.40%, indicating the room for improvement in general mathematical reasoning and understanding. |
Swaroop Mishra · Matthew Finlayson · Pan Lu · Leonard Tang · Sean Welleck · Chitta Baral · Tanmay Rajpurohit · Oyvind Tafjord · Ashish Sabharwal · Peter Clark · Ashwin Kalyan
|
-
|
Solving Math Word Problems with Process-based and Outcome-based Feedback
(
Poster
)
Recent work has shown that prompting language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise the finetuning of such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use process-based supervision or supervision from learned reward models that emulate process-based feedback. In total, we improve the previous best results from 16.8% to 12.7% final-answer error and from 14.0% to 3.4% reasoning error among final-answer-correct solutions. |
Jonathan Uesato · Nate Kushman · Ramana Kumar · H. Francis Song · Noah Siegel · Lisa Wang · Antonia Creswell · Geoffrey Irving · Irina Higgins 🔗 |
Author Information
Pan Lu (UCLA; AI2)
Swaroop Mishra (Arizona State University)
Sean Welleck (University of Washington)
Yuhuai Wu (Google)
Hannaneh Hajishirzi (University of Washington)
Percy Liang (Stanford University)

Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction. Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).
More from the Same Authors
-
2020 : Invited Talk 8 Presentation - Percy Liang - Semantic Parsing for Natural Language Interfaces »
Percy Liang -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 : IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning »
Pan Lu · Liang Qiu · Jiaqi Chen · Tanglin Xia · Yizhou Zhao · Wei Zhang · Zhou Yu · Xiaodan Liang · Song-Chun Zhu -
2021 : Theorem-Aware Geometry Problem Solving with Symbolic Reasoning and Theorem Prediction »
Pan Lu · Ran Gong · Shibiao Jiang · Liang Qiu · Siyuan Huang · Xiaodan Liang · Song-Chun Zhu · Ran Gong -
2021 : Towards Diagram Understanding and Cognitive Reasoning in Icon Question Answering »
Pan Lu · Liang Qiu · Jiaqi Chen · Tanglin Xia · Yizhou Zhao · Wei Zhang · Zhou Yu · Xiaodan Liang · Song-Chun Zhu -
2022 : Learn to Select Good Examples with Reinforcement Learning for Semi-structured Mathematical Reasoning »
Pan Lu · Liang Qiu · Kai-Wei Chang · Ying Nian Wu · Song-Chun Zhu · Tanmay Rajpurohit · Peter Clark · Ashwin Kalyan -
2022 : LILA: A Unified Benchmark for Mathematical Reasoning »
Swaroop Mishra · Matthew Finlayson · Pan Lu · Leonard Tang · Sean Welleck · Chitta Baral · Tanmay Rajpurohit · Oyvind Tafjord · Ashish Sabharwal · Peter Clark · Ashwin Kalyan -
2022 : ContextNER: Contextual Phrase Generation at Scale »
Himanshu Gupta · Shreyas Verma · Tarun Kumar · Swaroop Mishra · Tamanna Agrawal · Amogh Badugu · Himanshu Bhatt -
2022 : Out-of-Distribution Robustness via Targeted Augmentations »
Irena Gao · Shiori Sagawa · Pang Wei Koh · Tatsunori Hashimoto · Percy Liang -
2022 : Surgical Fine-Tuning Improves Adaptation to Distribution Shifts »
Yoonho Lee · Annie Chen · Fahim Tajwar · Ananya Kumar · Huaxiu Yao · Percy Liang · Chelsea Finn -
2022 : Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search »
Michał Zawalski · Michał Tyrolski · Konrad Czechowski · Damian Stachura · Piotr Piękos · Tomasz Odrzygóźdź · Yuhuai Wu · Łukasz Kuciński · Piotr Miłoś -
2022 : Surgical Fine-Tuning Improves Adaptation to Distribution Shifts »
Yoonho Lee · Annie Chen · Fahim Tajwar · Ananya Kumar · Huaxiu Yao · Percy Liang · Chelsea Finn -
2023 Poster: Data Selection for Language Models via Importance Resampling »
Sang Michael Xie · Shibani Santurkar · Tengyu Ma · Percy Liang -
2023 Poster: PRODIGY: Enabling In-context Learning Over Graphs »
Qian Huang · Hongyu Ren · Peng Chen · Gregor Kržmanc · Daniel Zeng · Percy Liang · Jure Leskovec -
2023 Poster: DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining »
Sang Michael Xie · Hieu Pham · Xuanyi Dong · Nan Du · Hanxiao Liu · Yifeng Lu · Percy Liang · Quoc V Le · Tengyu Ma · Adams Wei Yu -
2023 Poster: Self-Refine: Iterative Refinement with Self-Feedback »
Aman Madaan · Niket Tandon · Prakhar Gupta · Skyler Hallinan · Luyu Gao · Sarah Wiegreffe · Uri Alon · Nouha Dziri · Shrimai Prabhumoye · Yiming Yang · Shashank Gupta · Bodhisattwa Prasad Majumder · Katherine Hermann · Sean Welleck · Amir Yazdanbakhsh · Peter Clark -
2023 Poster: Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs »
Deepak Narayanan · Keshav Santhanam · Peter Henderson · Rishi Bommasani · Tony Lee · Percy Liang -
2023 Poster: Lexinvariant Language Models »
Qian Huang · Eric Zelikman · Sarah Chen · Yuhuai Wu · Gregory Valiant · Percy Liang -
2023 Poster: Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models »
Pan Lu · Baolin Peng · Hao Cheng · Michel Galley · Kai-Wei Chang · Ying Nian Wu · Song-Chun Zhu · Jianfeng Gao -
2023 Poster: Faith and Fate: Limits of Transformers on Compositionality »
Nouha Dziri · Ximing Lu · Melanie Sclar · Xiang (Lorraine) Li · Liwei Jiang · Bill Yuchen Lin · Sean Welleck · Peter West · Chandra Bhagavatula · Ronan Le Bras · Jena Hwang · Soumya Sanyal · Xiang Ren · Allyson Ettinger · Zaid Harchaoui · Yejin Choi -
2023 Poster: Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes »
Connor Toups · Rishi Bommasani · Kathleen Creel · Sarah Bana · Dan Jurafsky · Percy Liang -
2023 Poster: Fine-Grained Human Feedback Gives Better Rewards for Language Model Training »
Zeqiu Wu · Yushi Hu · Weijia Shi · Nouha Dziri · Alane Suhr · Prithviraj (Raj) Ammanabrolu · Noah Smith · Mari Ostendorf · Hannaneh Hajishirzi -
2023 Poster: AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback »
Yann Dubois · Xuechen Li · Rohan Taori · Tianyi Zhang · Ishaan Gulrajani · Jimmy Ba · Carlos Guestrin · Percy Liang · Tatsunori Hashimoto -
2023 Poster: How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources »
Yizhong Wang · Hamish Ivison · Pradeep Dasigi · Jack Hessel · Tushar Khot · Khyathi Chandu · David Wadden · Kelsey MacMillan · Noah Smith · Iz Beltagy · Hannaneh Hajishirzi -
2023 Poster: Holistic Evaluation of Text-to-Image Models »
Tony Lee · Michihiro Yasunaga · Chenlin Meng · Yifan Mai · Joon Sung Park · Agrim Gupta · Yunzhi Zhang · Deepak Narayanan · Hannah Teufel · Marco Bellagente · Minguk Kang · Taesung Park · Jure Leskovec · Jun-Yan Zhu · Fei-Fei Li · Jiajun Wu · Stefano Ermon · Percy Liang -
2023 Poster: DataComp: In search of the next generation of multimodal datasets »
Samir Yitzhak Gadre · Gabriel Ilharco · Alex Fang · Jonathan Hayase · Georgios Smyrnis · Thao Nguyen · Ryan Marten · Mitchell Wortsman · Dhruba Ghosh · Jieyu Zhang · Eyal Orgad · Rahim Entezari · Giannis Daras · Sarah Pratt · Vivek Ramanujan · Yonatan Bitton · Kalyani Marathe · Stephen Mussmann · Richard Vencu · Mehdi Cherti · Ranjay Krishna · Pang Wei Koh · Olga Saukh · Alexander Ratner · Shuran Song · Hannaneh Hajishirzi · Ali Farhadi · Romain Beaumont · Sewoong Oh · Alex Dimakis · Jenia Jitsev · Yair Carmon · Vaishaal Shankar · Ludwig Schmidt -
2023 Poster: GenEval: An object-focused framework for evaluating text-to-image alignment »
Dhruba Ghosh · Hannaneh Hajishirzi · Ludwig Schmidt -
2023 Oral: DataComp: In search of the next generation of multimodal datasets »
Samir Yitzhak Gadre · Gabriel Ilharco · Alex Fang · Jonathan Hayase · Georgios Smyrnis · Thao Nguyen · Ryan Marten · Mitchell Wortsman · Dhruba Ghosh · Jieyu Zhang · Eyal Orgad · Rahim Entezari · Giannis Daras · Sarah Pratt · Vivek Ramanujan · Yonatan Bitton · Kalyani Marathe · Stephen Mussmann · Richard Vencu · Mehdi Cherti · Ranjay Krishna · Pang Wei Koh · Olga Saukh · Alexander Ratner · Shuran Song · Hannaneh Hajishirzi · Ali Farhadi · Romain Beaumont · Sewoong Oh · Alex Dimakis · Jenia Jitsev · Yair Carmon · Vaishaal Shankar · Ludwig Schmidt -
2023 Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI »
Zhenwen Liang · Albert Q. Jiang · Katie Collins · Pan Lu · Kaiyu Yang · Sean Welleck · James McClelland -
2022 : Fine-Tuning without Distortion: Improving Robustness to Distribution Shifts »
Percy Liang · Ananya Kumar -
2022 Poster: Autoformalization with Large Language Models »
Yuhuai Wu · Albert Q. Jiang · Wenda Li · Markus Rabe · Charles Staats · Mateja Jamnik · Christian Szegedy -
2022 Poster: Patching open-vocabulary models by interpolating weights »
Gabriel Ilharco · Mitchell Wortsman · Samir Yitzhak Gadre · Shuran Song · Hannaneh Hajishirzi · Simon Kornblith · Ali Farhadi · Ludwig Schmidt -
2022 Poster: What Can Transformers Learn In-Context? A Case Study of Simple Function Classes »
Shivam Garg · Dimitris Tsipras · Percy Liang · Gregory Valiant -
2022 Poster: Insights into Pre-training via Simpler Synthetic Tasks »
Yuhuai Wu · Felix Li · Percy Liang -
2022 Poster: COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics »
Lianhui Qin · Sean Welleck · Daniel Khashabi · Yejin Choi -
2022 Poster: Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers »
Albert Q. Jiang · Wenda Li · Szymon Tworkowski · Konrad Czechowski · Tomasz Odrzygóźdź · Piotr Miłoś · Yuhuai Wu · Mateja Jamnik -
2022 Poster: Deep Bidirectional Language-Knowledge Graph Pretraining »
Michihiro Yasunaga · Antoine Bosselut · Hongyu Ren · Xikun Zhang · Christopher D Manning · Percy Liang · Jure Leskovec -
2022 Poster: Decentralized Training of Foundation Models in Heterogeneous Environments »
Binhang Yuan · Yongjun He · Jared Davis · Tianyi Zhang · Tri Dao · Beidi Chen · Percy Liang · Christopher Ré · Ce Zhang -
2022 Poster: Diffusion-LM Improves Controllable Text Generation »
Xiang Li · John Thickstun · Ishaan Gulrajani · Percy Liang · Tatsunori Hashimoto -
2022 Poster: STaR: Bootstrapping Reasoning With Reasoning »
Eric Zelikman · Yuhuai Wu · Jesse Mu · Noah Goodman -
2022 Poster: Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization? »
Rishi Bommasani · Kathleen A. Creel · Ananya Kumar · Dan Jurafsky · Percy Liang -
2022 Poster: Exploring Length Generalization in Large Language Models »
Cem Anil · Yuhuai Wu · Anders Andreassen · Aitor Lewkowycz · Vedant Misra · Vinay Ramasesh · Ambrose Slone · Guy Gur-Ari · Ethan Dyer · Behnam Neyshabur -
2022 Poster: Improving Self-Supervised Learning by Characterizing Idealized Representations »
Yann Dubois · Stefano Ermon · Tatsunori Hashimoto · Percy Liang -
2022 Poster: Solving Quantitative Reasoning Problems with Language Models »
Aitor Lewkowycz · Anders Andreassen · David Dohan · Ethan Dyer · Henryk Michalewski · Vinay Ramasesh · Ambrose Slone · Cem Anil · Imanol Schlag · Theo Gutman-Solo · Yuhuai Wu · Behnam Neyshabur · Guy Gur-Ari · Vedant Misra -
2022 Poster: QUARK: Controllable Text Generation with Reinforced Unlearning »
Ximing Lu · Sean Welleck · Jack Hessel · Liwei Jiang · Lianhui Qin · Peter West · Prithviraj Ammanabrolu · Yejin Choi -
2022 Poster: Path Independent Equilibrium Models Can Better Exploit Test-Time Computation »
Cem Anil · Ashwini Pokle · Kaiqu Liang · Johannes Treutlein · Yuhuai Wu · Shaojie Bai · J. Zico Kolter · Roger Grosse -
2022 Poster: NaturalProver: Grounded Mathematical Proof Generation with Language Models »
Sean Welleck · Jiacheng Liu · Ximing Lu · Hannaneh Hajishirzi · Yejin Choi -
2022 Poster: Block-Recurrent Transformers »
DeLesley Hutchins · Imanol Schlag · Yuhuai Wu · Ethan Dyer · Behnam Neyshabur -
2022 Poster: Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering »
Pan Lu · Swaroop Mishra · Tanglin Xia · Liang Qiu · Kai-Wei Chang · Song-Chun Zhu · Oyvind Tafjord · Peter Clark · Ashwin Kalyan -
2021 Workshop: Math AI for Education (MATHAI4ED): Bridging the Gap Between Research and Smart Education »
Pan Lu · Yuhuai Wu · Sean Welleck · Xiaodan Liang · Eric Xing · James McClelland -
2021 Workshop: Distribution shifts: connecting methods and applications (DistShift) »
Shiori Sagawa · Pang Wei Koh · Fanny Yang · Hongseok Namkoong · Jiashi Feng · Kate Saenko · Percy Liang · Sarah Bird · Sergey Levine -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 Poster: Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals »
Lang Liu · Krishna Pillutla · Sean Welleck · Sewoong Oh · Yejin Choi · Zaid Harchaoui -
2021 Poster: MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers »
Krishna Pillutla · Swabha Swayamdipta · Rowan Zellers · John Thickstun · Sean Welleck · Yejin Choi · Zaid Harchaoui -
2021 Oral: MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers »
Krishna Pillutla · Swabha Swayamdipta · Rowan Zellers · John Thickstun · Sean Welleck · Yejin Choi · Zaid Harchaoui -
2020 : Invited Talk 8 Q/A - Percy Liang »
Percy Liang -
2020 : VAIDA: An Educative Benchmark Creation Paradigm using Visual Analytics for Interactively Discouraging Artifacts (by Anjana Arunkumar, Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral and Chris Bryan) »
Anjana Arunkumar · Swaroop Mishra · Chitta Baral -
2020 Poster: Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming »
Sumanth Dathathri · Krishnamurthy Dvijotham · Alexey Kurakin · Aditi Raghunathan · Jonathan Uesato · Rudy Bunel · Shreya Shankar · Jacob Steinhardt · Ian Goodfellow · Percy Liang · Pushmeet Kohli -
2019 : Extended Poster Session »
Travis LaCroix · Marie Ossenkopf · Mina Lee · Nicole Fitzgerald · Daniela Mihai · Jonathon Hare · Ali Zaidi · Alexander Cowen-Rivers · Alana Marzoev · Eugene Kharitonov · Luyao Yuan · Tomasz Korbak · Paul Pu Liang · Yi Ren · Roberto Dessì · Peter Potash · Shangmin Guo · Tatsunori Hashimoto · Percy Liang · Julian Zubek · Zipeng Fu · Song-Chun Zhu · Adam Lerer -
2019 Poster: SPoC: Search-based Pseudocode to Code »
Sumith Kulal · Panupong Pasupat · Kartik Chandra · Mina Lee · Oded Padon · Alex Aiken · Percy Liang -
2019 Poster: On the Accuracy of Influence Functions for Measuring Group Effects »
Pang Wei Koh · Kai-Siang Ang · Hubert Teo · Percy Liang -
2019 Poster: Verified Uncertainty Calibration »
Ananya Kumar · Percy Liang · Tengyu Ma -
2019 Spotlight: Verified Uncertainty Calibration »
Ananya Kumar · Percy Liang · Tengyu Ma -
2018 : Natural Language Supervision »
Percy Liang -
2018 Poster: Loss Functions for Multiset Prediction »
Sean Welleck · Zixin Yao · Yu Gai · Jialin Mao · Zheng Zhang · Kyunghyun Cho -
2018 Poster: Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss »
Stephen Mussmann · Percy Liang -
2018 Poster: Semidefinite relaxations for certifying robustness to adversarial examples »
Aditi Raghunathan · Jacob Steinhardt · Percy Liang -
2018 Poster: A Retrieve-and-Edit Framework for Predicting Structured Outputs »
Tatsunori Hashimoto · Kelvin Guu · Yonatan Oren · Percy Liang -
2018 Oral: A Retrieve-and-Edit Framework for Predicting Structured Outputs »
Tatsunori Hashimoto · Kelvin Guu · Yonatan Oren · Percy Liang -
2017 : (Invited Talk) Percy Liang: Learning with Adversaries and Collaborators »
Percy Liang -
2017 Workshop: Machine Learning and Computer Security »
Jacob Steinhardt · Nicolas Papernot · Bo Li · Chang Liu · Percy Liang · Dawn Song -
2017 Demonstration: Babble Labble: Learning from Natural Language Explanations »
Braden Hancock · Paroma Varma · Percy Liang · Christopher Ré · Stephanie Wang -
2017 Poster: Learning Overcomplete HMMs »
Vatsal Sharan · Sham Kakade · Percy Liang · Gregory Valiant -
2017 Poster: Certified Defenses for Data Poisoning Attacks »
Jacob Steinhardt · Pang Wei Koh · Percy Liang -
2017 Poster: Saliency-based Sequential Image Attention with Multiset Prediction »
Sean Welleck · Jialin Mao · Kyunghyun Cho · Zheng Zhang -
2017 Poster: Unsupervised Transformation Learning via Convex Relaxations »
Tatsunori Hashimoto · Percy Liang · John Duchi -
2016 Workshop: Deep Learning for Action and Interaction »
Chelsea Finn · Raia Hadsell · David Held · Sergey Levine · Percy Liang -
2016 Workshop: Nonconvex Optimization for Machine Learning: Theory and Practice »
Hossein Mobahi · Anima Anandkumar · Percy Liang · Stefanie Jegelka · Anna Choromanska -
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan Hadfield-Menell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang -
2016 Poster: Unsupervised Risk Estimation Using Only Conditional Independence Structure »
Jacob Steinhardt · Percy Liang -
2015 : Sharing the "How" (and not the "What") »
Percy Liang -
2015 Workshop: Non-convex Optimization for Machine Learning: Theory and Practice »
Anima Anandkumar · Niranjan Uma Naresh · Kamalika Chaudhuri · Percy Liang · Sewoong Oh -
2015 Demonstration: CodaLab Worksheets for Reproducible, Executable Papers »
Percy Liang · Evelyne Viegas -
2015 Poster: On-the-Job Learning with Bayesian Decision Theory »
Keenon Werling · Arun Tejasvi Chaganty · Percy Liang · Christopher Manning -
2015 Spotlight: On-the-Job Learning with Bayesian Decision Theory »
Keenon Werling · Arun Tejasvi Chaganty · Percy Liang · Christopher Manning -
2015 Poster: Estimating Mixture Models via Mixtures of Polynomials »
Sida Wang · Arun Tejasvi Chaganty · Percy Liang -
2015 Poster: Learning with Relaxed Supervision »
Jacob Steinhardt · Percy Liang -
2015 Poster: Calibrated Structured Prediction »
Volodymyr Kuleshov · Percy Liang -
2014 Workshop: Challenges in Machine Learning workshop (CiML 2014) »
Isabelle Guyon · Evelyne Viegas · Percy Liang · Olga Russakovsky · Rinat Sergeev · Gábor Melis · Michele Sebag · Gustavo Stolovitzky · Jaume Bacardit · Michael S Kim · Ben Hamner -
2014 Poster: Altitude Training: Strong Bounds for Single-Layer Dropout »
Stefan Wager · William S Fithian · Sida Wang · Percy Liang -
2014 Poster: Simple MAP Inference via Low-Rank Relaxations »
Roy Frostig · Sida Wang · Percy Liang · Christopher D Manning -
2013 Poster: Dropout Training as Adaptive Regularization »
Stefan Wager · Sida Wang · Percy Liang -
2013 Spotlight: Dropout Training as Adaptive Regularization »
Stefan Wager · Sida Wang · Percy Liang -
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu -
2009 Workshop: The Generative and Discriminative Learning Interface »
Simon Lacoste-Julien · Percy Liang · Guillaume Bouchard -
2009 Poster: Asymptotically Optimal Regularization in Smooth Parametric Models »
Percy Liang · Francis Bach · Guillaume Bouchard · Michael Jordan -
2008 Workshop: Speech and Language: Unsupervised Latent-Variable Models »
Slav Petrov · Aria Haghighi · Percy Liang · Dan Klein -
2007 Poster: Agreement-Based Learning »
Percy Liang · Dan Klein · Michael Jordan -
2007 Spotlight: Agreement-Based Learning »
Percy Liang · Dan Klein · Michael Jordan -
2007 Poster: A Probabilistic Approach to Language Change »
Alexandre Bouchard-Côté · Percy Liang · Tom Griffiths · Dan Klein