The Rashomon Set Has It All: Analyzing Trustworthiness of Trees under Multiplicity
Abstract
In practice, many models from a function class can fit a dataset almost equally well. This collection of near-optimal models is known as the Rashomon set. Prior work has shown that the Rashomon set offers flexibility in choosing models aligned with secondary objectives like interpretability or fairness. However, it is unclear how far this flexibility extends to different trustworthy criteria, especially given that most trustworthy machine learning systems today still rely on complex specialized optimization procedures. Is the Rashomon set all you need for trustworthy model selection? Can simply searching the Rashomon set suffice to find models that are not only accurate but also fair, stable, robust, or private, without explicitly optimizing for these criteria?In this paper, we introduce a framework for systematically analyzing trustworthiness within Rashomon sets and conduct extensive experiments on high-stakes tabular datasets. We focus on sparse decision trees, where the Rashomon set can be fully enumerated. Across seven distinct metrics, we find that the Rashomon set almost always contains models that match or exceed the performance of state-of-the-art methods specifically designed to optimize individual trustworthiness criteria. These results suggest that for many practical applications, computing the Rashomon set once can serve as an efficient and effective method for identifying highly accurate and trustworthy models. Our framework can be a valuable tool for both benchmarking Rashomon sets of decision trees and studying the trustworthiness properties of interpretable models.