Towards Well-Calibrated AutoML: A Theoretical Analysis based on Ensemble Diversity
Abstract
Automated Machine Learning (AutoML) streamlines complex machine learning processes, making advanced analytical capabilities accessible to broader audiences. However, most existing AutoML frameworks emphasize predictive accuracy, often neglecting the critical aspect of probability calibration, essential in high-stakes decision-making domains. Miscalibrated probabilities can significantly compromise uncertainty estimates, leading to poor decision outcomes in areas such as medical diagnostics, autonomous driving, and financial risk management. This paper investigates the potential of ensemble-based AutoML methods to naturally improve probability calibration by implicitly increasing the so-called diversity of the models defining the ensemble. Specifically, we examine AUTO-SKLEARN, a leading ensemble-based AutoML framework, analyzing how its components—Base Learners, Meta-learning, Bayesian optimization, and Caruana ensemble construction—contribute to enhancing probability calibration by inducing better \textit{diversity}. Our analysis reveals that ensemble-based AutoML inherently supports mechanisms beneficial for producing well-calibrated probability estimates. By uncovering the structural elements of ensemble AutoML that foster better uncertainty quantification, this work establishes empirical evidence for further research aimed at developing automated tools capable of consistently generating predictions with well-calibrated probability estimates in real-world applications.