Timezone: »
Poster
Estimating decision tree learnability with polylogarithmic sample complexity
Guy Blanc · Neha Gupta · Jane Lange · Li-Yang Tan
We show that top-down decision tree learning heuristics (such as ID3, C4.5, and CART) are amenable to highly efficient {\sl learnability estimation}: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with {\sl polylogarithmically} many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree. This adds to a small but growing list of fundamental learning algorithms that have been shown to be amenable to learnability estimation.
En route to this result, we design and analyze sample-efficient {\sl minibatch} versions of top-down decision tree learning heuristics and show that they achieve the same provable guarantees as the full-batch versions. We further give ``active local'' versions of these heuristics: given a test point $x^\star$, we show how the label $T(x^\star)$ of the decision tree hypothesis $T$ can be computed with polylogarithmically many labeled examples, exponentially smaller than the number necessary to learn~$T$.
Author Information
Guy Blanc (Stanford University)
Neha Gupta (Stanford University)
Jane Lange (MIT)
Li-Yang Tan (Stanford University)
More from the Same Authors
-
2021 Poster: Provably efficient, succinct, and precise explanations »
Guy Blanc · Jane Lange · Li-Yang Tan -
2020 Poster: Universal guarantees for decision tree induction via a higher-order splitting criterion »
Guy Blanc · Neha Gupta · Jane Lange · Li-Yang Tan -
2018 Poster: Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression »
Neha Gupta · Aaron Sidford