Timezone: »
Poster
Decision trees as partitioning machines to characterize their generalization properties
Jean-Samuel Leboeuf · Frédéric LeBlanc · Mario Marchand
Decision trees are popular machine learning models that are simple to build and easy to interpret. Even though algorithms to learn decision trees date back to almost 50 years, key properties affecting their generalization error are still weakly bounded. Hence, we revisit binary decision trees on real-valued features from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. Using this new concept, we are able to find the exact VC dimension of decision stumps, which is given by the largest integer $d$ such that $2\ell \ge \binom{d}{\floor{\frac{d}{2}}}$, where $\ell$ is the number of real-valued features. We provide a recursive expression to bound the partitioning functions, resulting in a upper bound on the growth function of any decision tree structure. This allows us to show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N \log(N\ell)$. Finally, we elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets, with the advantage that no cross-validation is required.
Author Information
Jean-Samuel Leboeuf (Université Laval)
Frédéric LeBlanc (Université de Moncton)
Mario Marchand (Université Laval)
More from the Same Authors
-
2021 Spotlight: Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis »
Qi CHEN · Changjian Shui · Mario Marchand -
2023 Poster: On the Stability-Plasticity Dilemma in Continual Meta-Learning: Theory and Algorithm »
Qi CHEN · Changjian Shui · Ligong Han · Mario Marchand -
2021 Poster: Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis »
Qi CHEN · Changjian Shui · Mario Marchand -
2014 Poster: Multilabel Structured Output Learning with Random Spanning Trees of Max-Margin Markov Networks »
Mario Marchand · Hongyu Su · Emilie Morvant · Juho Rousu · John Shawe-Taylor -
2009 Poster: From PAC-Bayes Bounds to KL Regularization »
Pascal Germain · Alexandre Lacasse · Francois Laviolette · Mario Marchand · Sara Shanian -
2006 Poster: A PAC-Bayes Risk Bound for General Loss Functions »
Pascal Germain · Alexandre Lacasse · Francois Laviolette · Mario Marchand -
2006 Poster: PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier »
Alexandre Lacasse · Francois Laviolette · Mario Marchand · Pascal Germain · Nicolas Usunier -
2006 Spotlight: PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier »
Alexandre Lacasse · Francois Laviolette · Mario Marchand · Pascal Germain · Nicolas Usunier