Deep learning models are bad at signalling failure: They tend to make predictions with high confidence, and this is problematic in real-world applications such as healthcare, self-driving cars, and natural language systems, where there are considerable safety implications, or where there are discrepancies between the training data and data that the model makes predictions on. There is a pressing need both for understanding when models should not make predictions and improving model robustness to natural changes in the data.
This tutorial will give an overview of the landscape of uncertainty and robustness in deep learning. Namely, we examine calibration and out-of-distribution generalization as key tasks. Then we will go into a deep dive into promising avenues. This includes methods which average over multiple neural network predictions such as Bayesian neural nets, ensembles, and Gaussian processes; methods on the frontier of scale in terms of their overall parameter or prediction-time efficiency; and methods which encourage key inductive biases such as data augmentation. We ground these ideas in both empirical understanding and theory, and we provide practical recommendations with baselines and tips & tricks. Finally, we highlight open challenges in the field.