Workshop: Your Model is Wrong: Robustness and misspecification in probabilistic modeling

Influential Observations in Bayesian Regression Tree Models

Matthew Pratola


BCART (Bayesian Classification and Regression Trees) and BART (Bayesian Additive Regression Trees) are popular modern regression models. Their popularity is intimately tied to the ability to flexibly model complex responses depending on high-dimensional inputs while simultaneously being able to quantify uncertainties. However, surprisingly little work has been done to evaluate the sensitivity of these modern regression models to violations of modeling assumptions. In particular, we consider influential observations and propose methods for detecting influentials and adjusting predictions to not be unduly affected by such problematic data. We consider two detection diagnostics for Bayesian tree models, one an analogue of Cook's distance and the other taking the form of a divergence measure, and then propose an importance sampling algorithm to re-weight previously sampled posterior draws so as to remove the effects of influential data. Finally, our methods are demonstrated on real-world data where blind application of models can lead to poor predictions.

Chat is not available.