Timezone: »
Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation. Under certain assumptions, frequently made for Bayes consistency results, we show that consistency in GeDTs and GeFs extend to any pattern of missing input features, if missing at random. Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features.
Author Information
Alvaro Correia (Eindhoven University of Technology)
Robert Peharz (University of Cambridge)
Cassio de Campos (Eindhoven University of Technology)
More from the Same Authors
-
2022 : Panel »
Guy Van den Broeck · Cassio de Campos · Denis Maua · Kristian Kersting · Rianne van den Berg -
2020 Session: Orals & Spotlights Track 26: Graph/Relational/Theory »
Joan Bruna · Cassio de Campos -
2019 Poster: Bayesian Learning of Sum-Product Networks »
Martin Trapp · Robert Peharz · Hong Ge · Franz Pernkopf · Zoubin Ghahramani -
2016 Poster: Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables »
Mauro Scanagatta · Giorgio Corani · Cassio de Campos · Marco Zaffalon -
2015 Poster: Learning Bayesian Networks with Thousands of Variables »
Mauro Scanagatta · Cassio de Campos · Giorgio Corani · Marco Zaffalon -
2014 Poster: Advances in Learning Bayesian Networks of Bounded Treewidth »
Siqi Nie · Denis Maua · Cassio P de Campos · Qiang Ji -
2014 Spotlight: Advances in Learning Bayesian Networks of Bounded Treewidth »
Siqi Nie · Denis Maua · Cassio P de Campos · Qiang Ji -
2014 Poster: Global Sensitivity Analysis for MAP Inference in Graphical Models »
Jasper De Bock · Cassio P de Campos · Alessandro Antonucci -
2011 Poster: Solving Decision Problems with Limited Information »
Denis Maua · Cassio P de Campos -
2011 Spotlight: Solving Decision Problems with Limited Information »
Denis Maua · Cassio P de Campos