Timezone: »
Several recent publications report advances in training optimal decision trees (ODTs) using mixed-integer programs (MIPs), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on 1-norm support vector machine model, to train a binary oblique ODT for classification problems. We further present techniques, such as cutting planes, to tighten its linear relaxation, to improve run times to reach optimality. Using 36 datasets from the University of California Irvine Machine Learning Repository, we demonstrate that our training approach outperforms its counterparts from literature in terms of out-of-sample performance (around 10% improvement in mean out-of-sample testing accuracy). Towards our goal of developing a scalable framework to train multivariate ODT on large datasets, we propose a new linear programming based data selection method to choose a subset of the data, and use it to train a decision tree through our proposed MIP model. We conclude this paper with extensive numerical testing results, that showcase the generalization performance of our new MIP formulation, and the improvement in mean out-of-sample accuracy on large datasets.
Author Information
Haoran Zhu (University of Wisconsin-Madison)
Pavankumar Murali (IBM)
Dzung Phan (IBM Research, T. J. Watson Research Center)
Lam Nguyen (IBM Research, Thomas J. Watson Research Center)
Jayant Kalagnanam (IBM Research)
More from the Same Authors
-
2021 : Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management »
Pavithra Harsha · Ashish Jagmohan · Jayant Kalagnanam · Brian Quanz · Divya Singhvi -
2022 : c-MBA: Adversarial Attack for Cooperative MARL Using Learned Dynamics Model »
Nhan H Pham · Lam Nguyen · Jie Chen · Thanh Lam Hoang · Subhro Das · Lily Weng -
2021 Workshop: New Frontiers in Federated Learning: Privacy, Fairness, Robustness, Personalization and Data Ownership »
Nghia Hoang · Lam Nguyen · Pin-Yu Chen · Tsui-Wei Weng · Sara Magliacane · Bryan Kian Hsiang Low · Anoop Deoras -
2021 Poster: On the Equivalence between Neural Network and Support Vector Machine »
Yilan Chen · Wei Huang · Lam Nguyen · Tsui-Wei Weng -
2021 Poster: FedDR – Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization »
Quoc Tran Dinh · Nhan H Pham · Dzung Phan · Lam Nguyen -
2021 Poster: Ensembling Graph Predictions for AMR Parsing »
Thanh Lam Hoang · Gabriele Picco · Yufang Hou · Young-Suk Lee · Lam Nguyen · Dzung Phan · Vanessa Lopez · Ramon Fernandez Astudillo -
2021 Poster: Cardinality-Regularized Hawkes-Granger Model »
Tsuyoshi Ide · Georgios Kollias · Dzung Phan · Naoki Abe -
2020 Poster: Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function »
Quoc Tran Dinh · Deyi Liu · Lam Nguyen -
2019 Poster: Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD »
Ha Nguyen · Lam Nguyen · Marten van Dijk -
2018 Poster: Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability »
Michael Tsang · Hanpeng Liu · Sanjay Purushotham · Pavankumar Murali · Yan Liu