Skip to yearly menu bar Skip to main content


Poster

When do random forests fail?

Cheng Tang · Damien Garreau · Ulrike von Luxburg

Room 517 AB #109

Keywords: [ Boosting and Ensemble Methods ] [ Regression ] [ Learning Theory ]


Abstract:

Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent.
As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests.

Live content is unavailable. Log in and register to view live content