Timezone: »

When do random forests fail?
Cheng Tang · Damien Garreau · Ulrike von Luxburg

Thu Dec 06 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #109

Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent.
As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests.

Author Information

Cheng Tang (George Washington University)
Damien Garreau (Max Planck Institute)
Ulrike von Luxburg (University of Tübingen)

More from the Same Authors