Tutorials

[ Virtual ]

Conversational AI systems interact with human users while completing user requests or simply chit-chat. These systems have applications ranging from personal assistance, health assistance to customer services, etc. In this three-part tutorial, we will first give an overview of the state-of-the-art modularized conversational AI approaches that are commonly adopted by task-oriented dialog systems. We will then give an overview of the current sequence to sequence , generation-based conversational AI approaches. We will discuss the challenges and shortcomings of vanilla generation-based models such as the lack of knowledge, consistency, empathy, controllability, versatility, etc. We will then highlight current work in addressing these challenges and in improving the depth of generation-based ConvAI. In the final part of the tutorial we will point out remaining challenges of conversational AI and possible directions for future research, including how to mitigate inappropriate responses and lifelong learning. We will also present an overview of shared tasks and publicly available resources for both modularized and generation-based conversational AI.

Integration and differentiation play key roles in machine learning.

We take a tour of some old and new results on methods and algorithms for integration and differentiation, in particular, for calculating expectations and slopes. We review numerical and Monte-Carlo integration for calculating expectations. We discuss the change-of-variables method leading to normalizing flows and discuss inference in time series to get `there''. To get`

back again'', we review gradients for calculating slopes by the chain rule and automatic differentiation, the basis for backpropagation in neural networks. We discuss backpropagation in three settings: in probabilistic graphical models, through an equality constraint, and with an inequality constraint.

To complete the round-trip, we explore algorithms for calculating gradients of expectations, the basis of methods for variational inference, reinforcement learning, and experimental design.

There is great interest in generalizing deep learning to more exotic types of data, such as graphs, chemical structures, volumetric images, omndirectional images, etc. In each case, the data has nontrivial structure and symmetries and the challenge is to find the right generalization of classical neural network layers like convolution to reflect this. It has become clear that in all of these cases and more, equivariance to symmetry transformations is the key principle that points us to an effective generalization.

New architectures inspired by this principle have already proved their effectiveness in multiple domains. However, some of the underlying ideas are still foreign to much of the community, partly because of the mathematics involved. The purpose of this tutorial is to bridge this gap by giving a very accessible introduction to this emerging area with many practical examples and details of how to implement equivariant architectures in existing deep learning frameworks.

Timetable: Part I (Taco Cohen) 0:00 - Introduction to equivariant networks 39:00 - Examples and applications 51:00 - Equivariant convolutions

Part II (Risi Kondor) 0:00 - Introduction 7:50 - Group Representations 27:35 - Designing equivariant Neurons 45:30 - Fourier theory 56:25 - Implementation

[ Virtual ]

Deep learning models are bad at signalling failure: They tend to make predictions with high confidence, and this is problematic in real-world applications such as healthcare, self-driving cars, and natural language systems, where there are considerable safety implications, or where there are discrepancies between the training data and data that the model makes predictions on. There is a pressing need both for understanding when models should not make predictions and improving model robustness to natural changes in the data. This tutorial will give an overview of the landscape of uncertainty and robustness in deep learning. Namely, we examine calibration and out-of-distribution generalization as key tasks. Then we will go into a deep dive into promising avenues. This includes methods which average over multiple neural network predictions such as Bayesian neural nets, ensembles, and Gaussian processes; methods on the frontier of scale in terms of their overall parameter or prediction-time efficiency; and methods which encourage key inductive biases such as data augmentation. We ground these ideas in both empirical understanding and theory, and we provide practical recommendations with baselines and tips & tricks. Finally, we highlight open challenges in the field.

[ Virtual ]

Bayesian probabilistic modelling provides a principled framework for coherent inference and prediction under uncertainty. Approximate inference addresses the key challenge of Bayesian computation, that is, the computation of the intractable posterior distribution and related quantities such as the Bayesian predictive distribution. Significant progress has been made in this field during the past 10 years, which enables a wide application of Bayesian modelling techniques to machine learning tasks in computer vision, natural language processing, reinforcement learning etc.

This tutorial offers a coherent summary of the recent advances in approximate inference. We will start the tutorial with an introduction to the approximate inference concept and the basics in variational inference. Then we will describe the fundamental aspects of the modern approximate inference, including scalable inference, Monte Carlo techniques, amortized inference, approximate posterior design, and optimisation objectives. The connections between these recent advances will also be discussed. Lastly, we will provide application examples of advanced approximate inference techniques to downstream uncertainty estimation and decision-making tasks and conclude with a discussion on future research directions.

Timetable Tutorial part 1: basics of approximate inference (approx. 30min) Coffee break & live Q&A 1 (approx. 10min) Tutorial part 2: advances 1 (approx. 30min) Coffee break & live …

In this tutorial, we will provide modern perspectives on abstraction and reasoning in AI systems. Traditionally, symbolic and probabilistic methods have dominated the domains of concept formation, abstraction, and automated reasoning. More recently, deep learning-based approaches have led to breakthroughs in some domains, like tackling hard search problems such as games and combinatorial search tasks. However, the resulting systems are still limited in scope and capabilities, especially in producing interpretable results and verifiable abstractions. Here, we will address a set of questions: Why is an ability for conceptual abstraction essential for intelligence, in both humans and machines? How can we get machines to learn flexible and extendable concepts that can transfer between domains? What do we understand by "strong reasoning capabilities" and how do we measure these capabilities in AI systems? How do deep learning-based methods change the landscape of computer-assisted reasoning? What are the failure modes of such methods and possible solutions to these issues?

Schedule 7:00pm - 7:40pm UTC Speaker: Francois Chollet Title: Why abstraction is the key, and what we're still missing

7:40pm - 7:50pm UTC Questions

7:50pm - 8:30pm UTC Speaker: Melanie Mitchell Title: Mechanisms of abstraction and analogy in natural and artificial intelligence

8:30pm - …

The field of astrophysics has been an avid consumer—and also a developer—of new methods in data science (maybe even dating back to the invention of Bayesian inference). With constantly growing data volumes, increasingly complex and costly physical models, and demand for extremely precise measurements, astrophysics presents opportunities for innovation in machine learning (ML) methods.

In this tutorial, we will give a sense of the myriad connections between astrophysics and ML, and demonstrate that astrophysics is an ideal sandbox for developing and testing ML applications and innovations. We will also discuss areas where vanilla ML methods fail or require extension or elaboration to be competitive with traditional astronomy techniques.

Astronomical data falls into four broad types: imaging, spectroscopy, time series, and catalogs. We will discuss the scientific understandings and precise measurements that we hope to obtain from these data sets, the challenges specific to each of them, and the successes and opportunities for ML applications in these domains. We will demonstrate how to obtain and start working with current leading-edge public data sets of each type. Participants should expect to do hands-on work with the data during the tutorial (we’ll demo with Python and Jupyter, but any platform can play). By …

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. Similarly, federated analytics (FA) allows data scientists to generate analytical insight from the combined information in distributed datasets without requiring data centralization. Federated approaches embody the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.

Motivated by the explosive growth in federated learning and analytics research, this tutorial will provide a gentle introduction to the area. The focus will be on cross-device federated learning, including deep dives on federated optimization and differentially privacy, but federated analytics and cross-silo federated learning will also be discussed. In addition to optimization and privacy, we will also introduce personalization, robustness, fairness, and systems challenges in the federated setting with an emphasis on open problems.

Virtually all deep learning is built upon the notion of explicit computation: layers of a network are written in terms of their explicit step-by-step computations used to map inputs to outputs. But a rising trend in deep learning takes a different approach: implicit layers, where one instead specifies the conditions for a layer’s output to satisfy. Such architectures date back to early work on recurrent networks but have recently gained a great deal of attention as the approach behind Neural ODEs, Deep Equilibrium Models (DEQs), FFJORD, optimization layers, SVAEs, implicit meta-learning, and many other approaches. These methods can have substantial conceptual, computational, and modeling benefits: they often make it much easier to specify simple-yet-powerful architectures, can vastly reduce the memory consumption of deep networks, and allow more natural modeling of e.g. continuous-time phenomena.

This tutorial will provide a unified perspective on implicit layers, illustrating how the implicit modeling framework encompasses all the models discussed above, and providing a practical view of how to integrate such approaches into modern deep learning systems. We will cover the history and motivation of implicit layers, discuss how to solve the resulting "forward" inference problem, and then highlight how to compute gradients through such layers …

As machine learning is deployed in all aspects of society, it has become increasingly important to ensure stakeholders understand and trust these models. Decision makers must have a clear understanding of the model behavior so they can diagnose errors and potential biases in these models, and decide when and how to employ them. However, most accurate models that are deployed in practice are not interpretable, making it difficult for users to understand where the predictions are coming from, and thus, difficult to trust.

Recent work on explanation techniques in machine learning offers an attractive solution: they provide intuitive explanations for “any” machine learning model by approximating complex machine learning models with simpler ones.

In this tutorial, we will discuss several post hoc explanation methods, and focus on their advantages and shortcomings. We will cover three families of techniques: (a) single instance gradient-based attribution methods (saliency maps), (b) model agnostic explanations via perturbations, such as LIME/SHAP and counterfactual explanations, and (c) surrogate modeling for global interpretability, such as MUSE. For each of these approaches, we will provide their problem setup, prominent methods, example applications, and finally, discuss their vulnerabilities and shortcomings. We will conclude the tutorial with an overview of future …

Deep learning models are bad at signalling failure: They tend to make predictions with high confidence, and this is problematic in real-world applications such as healthcare, self-driving cars, and natural language systems, where there are considerable safety implications, or where there are discrepancies between the training data and data that the model makes predictions on. There is a pressing need both for understanding when models should not make predictions and improving model robustness to natural changes in the data.

This tutorial will give an overview of the landscape of uncertainty and robustness in deep learning. Namely, we examine calibration and out-of-distribution generalization as key tasks. Then we will go into a deep dive into promising avenues. This includes methods which average over multiple neural network predictions such as Bayesian neural nets, ensembles, and Gaussian processes; methods on the frontier of scale in terms of their overall parameter or prediction-time efficiency; and methods which encourage key inductive biases such as data augmentation. We ground these ideas in both empirical understanding and theory, and we provide practical recommendations with baselines and tips & tricks. Finally, we highlight open challenges in the field.

Virtually all deep learning is built upon the notion of explicit computation: layers of a network are written in terms of their explicit step-by-step computations used to map inputs to outputs. But a rising trend in deep learning takes a different approach: implicit layers, where one instead specifies the conditions for a layer’s output to satisfy. Such architectures date back to early work on recurrent networks but have recently gained a great deal of attention as the approach behind Neural ODEs, Deep Equilibrium Models (DEQs), FFJORD, optimization layers, SVAEs, implicit meta-learning, and many other approaches. These methods can have substantial conceptual, computational, and modeling benefits: they often make it much easier to specify simple-yet-powerful architectures, can vastly reduce the memory consumption of deep networks, and allow more natural modeling of e.g. continuous-time phenomena.

This tutorial will provide a unified perspective on implicit layers, illustrating how the implicit modeling framework encompasses all the models discussed above, and providing a practical view of how to integrate such approaches into modern deep learning systems. We will cover the history and motivation of implicit layers, discuss how to solve the resulting "forward" inference problem, and then highlight how to compute gradients through such layers …

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. Similarly, federated analytics (FA) allows data scientists to generate analytical insight from the combined information in distributed datasets without requiring data centralization. Federated approaches embody the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.

Motivated by the explosive growth in federated learning and analytics research, this tutorial will provide a gentle introduction to the area. The focus will be on cross-device federated learning, including deep dives on federated optimization and differentially privacy, but federated analytics and cross-silo federated learning will also be discussed. In addition to optimization and privacy, we will also introduce personalization, robustness, fairness, and systems challenges in the federated setting with an emphasis on open problems.

Website: (https://sites.google.com/view/fl-tutorial/home)

This tutorial will cover policy gradients methods in reinforcement learning, with a focus on understanding foundational ideas from an optimization perspective. We will discuss the properties of the policy objective, in terms of two critical properties for convergence rates when using stochastic gradient approaches: variance and curvature. We will explain how the policy objective can be a particularly difficult optimization problem, as it can have large flat regions and stochastic samples of the gradient can be very high variance. We will first explain how to use standard tools from optimization to reduce the variance of the gradient estimate, as well as techniques to mitigate curvature issues. We will then discuss optimization improvements that leverage more knowledge about the objective, including the Markov property and how to modify the state distribution for more coverage. We will discuss how standard Actor-Critic methods with (off-policy) data re-use provide RL-specific variance reduction approaches. We will then conclude with an overview of what is known theoretically about the policy objective, where we discuss the role of entropy-regularization and exploration for mitigating curvature issues. The tutorial website is here: Home (google.com)

Timetable: Nicolas - 40 minute presentation + 10 minute Q&A Martha - 40 minute presentation …