Hold That Exit: Near Optimal Early-Exit Inference via Recall
Abstract
Early-exit (EE) models improve the efficiency of deep neural networks by attaching auxiliary classifiers to intermediate layers, enabling predictions before the final layer and reducing inference latency and cost. A central challenge, however, is the principled design of provably efficient exit rules—a dimension often underexplored in practice, where simple confidence-thresholding dominates.We provide theoretical guidance for designing such rules. We prove that exit strategies without recall—including standard thresholding—fail to achieve any constant-factor approximation of the optimal accuracy–latency trade-off. To address this, we formalize and analyze with-recall strategies, which permit revisiting earlier exits to balance accuracy and efficiency. Our results show that recall is indispensable for provable performance guarantees.Empirical evaluations on computer vision tasks further elucidate the structure of optimal exit rules. In these settings, the optimal strategy reduces to adaptive thresholding with recall, offering a theoretical foundation for the practical deployment of early-exit models.