A “sketch” is a data structure supporting some pre-specified set of queries and updates to a database while consuming space substantially (often exponentially) less than the information theoretic minimum required to store everything seen, and thus can also be seen as some form of functional compression. The advantages of sketching include less memory consumption, faster algorithms, and reduced bandwidth requirements in distributed computing environments. A "streaming" algorithm is one that dynamically updates a sketch as data is updated. In this tutorial we sketch (pun intended) a suite of tools from the sketching literature for counting problems, graph problems, finding frequent items, dimensionality reduction, and computational linear algebra, together with a discussion of lower bounds.

[ Virtual ]

Conversational AI systems interact with human users while completing user requests or simply chit-chat. These systems have applications ranging from personal assistance, health assistance to customer services, etc. In this three-part tutorial, we will first give an overview of the state-of-the-art modularized conversational AI approaches that are commonly adopted by task-oriented dialog systems. We will then give an overview of the current sequence to sequence , generation-based conversational AI approaches. We will discuss the challenges and shortcomings of vanilla generation-based models such as the lack of knowledge, consistency, empathy, controllability, versatility, etc. We will then highlight current work in addressing these challenges and in improving the depth of generation-based ConvAI. In the final part of the tutorial we will point out remaining challenges of conversational AI and possible directions for future research, including how to mitigate inappropriate responses and lifelong learning. We will also present an overview of shared tasks and publicly available resources for both modularized and generation-based conversational AI.

In recent years machine learning research has been dominated by optimisation-based learning methods (take gradient descent, for example, which is ubiquitous in deep learning). However, while tools that operate under this paradigm have proven to be very powerful, they are often not well suited for tackling complex challenges such as highly non-stationary targets or explicit multi-agent systems. In an attempt to overcome such limitations, some researchers are instead turning towards open-ended methods, and considering how to design the underlying learning dynamics. This tutorial discusses how different tools can be applied to construct and combine adaptive objectives for populations of learners. We begin by providing background on the problem setting, basic tools and philosophy. In a second part we then dive into the basics of evolutionary computation. In particular, we frame the development of evolutionary methods as a focus shift away from gradient-free optimisers in search of more generic and powerful tools for designing learning dynamics. Finally, we provide a more detailed overview of techniques and research around training and evaluating populations of agents.

Integration and differentiation play key roles in machine learning.

We take a tour of some old and new results on methods and algorithms for integration and differentiation, in particular, for calculating expectations and slopes. We review numerical and Monte-Carlo integration for calculating expectations. We discuss the change-of-variables method leading to normalizing flows and discuss inference in time series to get `there''. To get`

back again'', we review gradients for calculating slopes by the chain rule and automatic differentiation, the basis for backpropagation in neural networks. We discuss backpropagation in three settings: in probabilistic graphical models, through an equality constraint, and with an inequality constraint.

To complete the round-trip, we explore algorithms for calculating gradients of expectations, the basis of methods for variational inference, reinforcement learning, and experimental design.

There is great interest in generalizing deep learning to more exotic types of data, such as graphs, chemical structures, volumetric images, omndirectional images, etc. In each case, the data has nontrivial structure and symmetries and the challenge is to find the right generalization of classical neural network layers like convolution to reflect this. It has become clear that in all of these cases and more, equivariance to symmetry transformations is the key principle that points us to an effective generalization.

New architectures inspired by this principle have already proved their effectiveness in multiple domains. However, some of the underlying ideas are still foreign to much of the community, partly because of the mathematics involved. The purpose of this tutorial is to bridge this gap by giving a very accessible introduction to this emerging area with many practical examples and details of how to implement equivariant architectures in existing deep learning frameworks.

Timetable: Part I (Taco Cohen) 0:00 - Introduction to equivariant networks 39:00 - Examples and applications 51:00 - Equivariant convolutions

Part II (Risi Kondor) 0:00 - Introduction 7:50 - Group Representations 27:35 - Designing equivariant Neurons 45:30 - Fourier theory 56:25 - Implementation

The evaluation and optimization of machine learning systems have largely adopted well-known performance metrics like accuracy (for classification) or squared error (for regression). While these metrics are reusable across a variety of machine learning tasks, they make strong assumptions often not observed when situated in a broader technical or sociotechnical system. This is especially true in systems that interact with large populations of humans attempting to complete a goal or satisfy a need (e.g. search, recommendation, game-playing). In this tutorial, we will present methods for developing evaluation metrics grounded in what users expect of the system and how they respond to system decisions. The goal of this tutorial is both to share methods for designing user-based quantitative metrics and to motivate new research into optimizing for these more structured metrics.

The brain remains the only known example of a truly general-purpose intelligent system. The study of human and animal cognition has revealed key insights, such as the ideas of parallel distributed processing, biological vision, and learning from reward signals, that have heavily influenced the design of artificial learning systems. Many AI researchers continue to look to neuroscience as a source of inspiration and insight. A key difficulty is that neuroscience is a vast and heterogeneous area of study, encompassing a bewildering array of subfields. In this tutorial, we will seek to provide both a broad overview of neuroscience as a whole, as well as a focused look at two areas -- computational cognitive neuroscience and the neuroscience of learning in circuits -- that we believe are particularly relevant for AI researchers today. We will conclude by highlighting several ongoing lines of work that seek to import insights from these areas of neuroscience into AI, and vice versa.

[ Virtual ]

Deep learning models are bad at signalling failure: They tend to make predictions with high confidence, and this is problematic in real-world applications such as healthcare, self-driving cars, and natural language systems, where there are considerable safety implications, or where there are discrepancies between the training data and data that the model makes predictions on. There is a pressing need both for understanding when models should not make predictions and improving model robustness to natural changes in the data. This tutorial will give an overview of the landscape of uncertainty and robustness in deep learning. Namely, we examine calibration and out-of-distribution generalization as key tasks. Then we will go into a deep dive into promising avenues. This includes methods which average over multiple neural network predictions such as Bayesian neural nets, ensembles, and Gaussian processes; methods on the frontier of scale in terms of their overall parameter or prediction-time efficiency; and methods which encourage key inductive biases such as data augmentation. We ground these ideas in both empirical understanding and theory, and we provide practical recommendations with baselines and tips & tricks. Finally, we highlight open challenges in the field.

Reinforcement learning (RL) provides a mathematical formalism for learning-based control that allows for acquisition of near-optimal behaviors by optimizing user-specified reward functions. While RL methods have received considerable attention recently due to impressive applications in many areas, the fact that RL requires a fundamentally online learning paradigm is one of the biggest obstacles to its widespread adoption. Online interaction is often impractical, because data collection is expensive (e.g., in robotics, or educational agents) or dangerous (e.g., in autonomous driving, or healthcare). An alternate approach is to utilize RL algorithms that effectively leverage previously collected experience without requiring online interaction. This has been referred to as batch RL, offline RL, or data-driven RL. Such algorithms hold tremendous promise for making it possible to turn datasets into powerful decision-making engines, similarly to how datasets have proven key to the success of supervised learning in vision and NLP. In this tutorial, we aim to provide the audience with the conceptual tools needed to both utilize offline RL as a tool, and to conduct research in this exciting area. We aim to provide an understanding of the challenges in offline RL, particularly in the context of modern deep RL methods, and describe some potential …

[ Virtual ]

Bayesian probabilistic modelling provides a principled framework for coherent inference and prediction under uncertainty. Approximate inference addresses the key challenge of Bayesian computation, that is, the computation of the intractable posterior distribution and related quantities such as the Bayesian predictive distribution. Significant progress has been made in this field during the past 10 years, which enables a wide application of Bayesian modelling techniques to machine learning tasks in computer vision, natural language processing, reinforcement learning etc.

This tutorial offers a coherent summary of the recent advances in approximate inference. We will start the tutorial with an introduction to the approximate inference concept and the basics in variational inference. Then we will describe the fundamental aspects of the modern approximate inference, including scalable inference, Monte Carlo techniques, amortized inference, approximate posterior design, and optimisation objectives. The connections between these recent advances will also be discussed. Lastly, we will provide application examples of advanced approximate inference techniques to downstream uncertainty estimation and decision-making tasks and conclude with a discussion on future research directions.

Timetable Tutorial part 1: basics of approximate inference (approx. 30min) Coffee break & live Q&A 1 (approx. 10min) Tutorial part 2: advances 1 (approx. 30min) Coffee break & live …

The field of astrophysics has been an avid consumer—and also a developer—of new methods in data science (maybe even dating back to the invention of Bayesian inference). With constantly growing data volumes, increasingly complex and costly physical models, and demand for extremely precise measurements, astrophysics presents opportunities for innovation in machine learning (ML) methods.

In this tutorial, we will give a sense of the myriad connections between astrophysics and ML, and demonstrate that astrophysics is an ideal sandbox for developing and testing ML applications and innovations. We will also discuss areas where vanilla ML methods fail or require extension or elaboration to be competitive with traditional astronomy techniques.

Astronomical data falls into four broad types: imaging, spectroscopy, time series, and catalogs. We will discuss the scientific understandings and precise measurements that we hope to obtain from these data sets, the challenges specific to each of them, and the successes and opportunities for ML applications in these domains. We will demonstrate how to obtain and start working with current leading-edge public data sets of each type. Participants should expect to do hands-on work with the data during the tutorial (we’ll demo with Python and Jupyter, but any platform can play). By …

In this tutorial, we will provide modern perspectives on abstraction and reasoning in AI systems. Traditionally, symbolic and probabilistic methods have dominated the domains of concept formation, abstraction, and automated reasoning. More recently, deep learning-based approaches have led to breakthroughs in some domains, like tackling hard search problems such as games and combinatorial search tasks. However, the resulting systems are still limited in scope and capabilities, especially in producing interpretable results and verifiable abstractions. Here, we will address a set of questions: Why is an ability for conceptual abstraction essential for intelligence, in both humans and machines? How can we get machines to learn flexible and extendable concepts that can transfer between domains? What do we understand by "strong reasoning capabilities" and how do we measure these capabilities in AI systems? How do deep learning-based methods change the landscape of computer-assisted reasoning? What are the failure modes of such methods and possible solutions to these issues?

Schedule 7:00pm - 7:40pm UTC Speaker: Francois Chollet Title: Why abstraction is the key, and what we're still missing

7:40pm - 7:50pm UTC Questions

7:50pm - 8:30pm UTC Speaker: Melanie Mitchell Title: Mechanisms of abstraction and analogy in natural and artificial intelligence

8:30pm - …

This tutorial will cover policy gradients methods in reinforcement learning, with a focus on understanding foundational ideas from an optimization perspective. We will discuss the properties of the policy objective, in terms of two critical properties for convergence rates when using stochastic gradient approaches: variance and curvature. We will explain how the policy objective can be a particularly difficult optimization problem, as it can have large flat regions and stochastic samples of the gradient can be very high variance. We will first explain how to use standard tools from optimization to reduce the variance of the gradient estimate, as well as techniques to mitigate curvature issues. We will then discuss optimization improvements that leverage more knowledge about the objective, including the Markov property and how to modify the state distribution for more coverage. We will discuss how standard Actor-Critic methods with (off-policy) data re-use provide RL-specific variance reduction approaches. We will then conclude with an overview of what is known theoretically about the policy objective, where we discuss the role of entropy-regularization and exploration for mitigating curvature issues.

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. Similarly, federated analytics (FA) allows data scientists to generate analytical insight from the combined information in distributed datasets without requiring data centralization. Federated approaches embody the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.

Motivated by the explosive growth in federated learning and analytics research, this tutorial will provide a gentle introduction to the area. The focus will be on cross-device federated learning, including deep dives on federated optimization and differentially privacy, but federated analytics and cross-silo federated learning will also be discussed. In addition to optimization and privacy, we will also introduce personalization, robustness, fairness, and systems challenges in the federated setting with an emphasis on open problems.

Virtually all deep learning is built upon the notion of explicit computation: layers of a network are written in terms of their explicit step-by-step computations used to map inputs to outputs. But a rising trend in deep learning takes a different approach: implicit layers, where one instead specifies the conditions for a layer’s output to satisfy. Such architectures date back to early work on recurrent networks but have recently gained a great deal of attention as the approach behind Neural ODEs, Deep Equilibrium Models (DEQs), FFJORD, optimization layers, SVAEs, implicit meta-learning, and many other approaches. These methods can have substantial conceptual, computational, and modeling benefits: they often make it much easier to specify simple-yet-powerful architectures, can vastly reduce the memory consumption of deep networks, and allow more natural modeling of e.g. continuous-time phenomena.

This tutorial will provide a unified perspective on implicit layers, illustrating how the implicit modeling framework encompasses all the models discussed above, and providing a practical view of how to integrate such approaches into modern deep learning systems. We will cover the history and motivation of implicit layers, discuss how to solve the resulting "forward" inference problem, and then highlight how to compute gradients through such layers …

As machine learning is deployed in all aspects of society, it has become increasingly important to ensure stakeholders understand and trust these models. Decision makers must have a clear understanding of the model behavior so they can diagnose errors and potential biases in these models, and decide when and how to employ them. However, most accurate models that are deployed in practice are not interpretable, making it difficult for users to understand where the predictions are coming from, and thus, difficult to trust.

Recent work on explanation techniques in machine learning offers an attractive solution: they provide intuitive explanations for “any” machine learning model by approximating complex machine learning models with simpler ones.

In this tutorial, we will discuss several post hoc explanation methods, and focus on their advantages and shortcomings. We will cover three families of techniques: (a) single instance gradient-based attribution methods (saliency maps), (b) model agnostic explanations via perturbations, such as LIME/SHAP and counterfactual explanations, and (c) surrogate modeling for global interpretability, such as MUSE. For each of these approaches, we will provide their problem setup, prominent methods, example applications, and finally, discuss their vulnerabilities and shortcomings. We will conclude the tutorial with an overview of future …

Deep learning models are bad at signalling failure: They tend to make predictions with high confidence, and this is problematic in real-world applications such as healthcare, self-driving cars, and natural language systems, where there are considerable safety implications, or where there are discrepancies between the training data and data that the model makes predictions on. There is a pressing need both for understanding when models should not make predictions and improving model robustness to natural changes in the data.

This tutorial will give an overview of the landscape of uncertainty and robustness in deep learning. Namely, we examine calibration and out-of-distribution generalization as key tasks. Then we will go into a deep dive into promising avenues. This includes methods which average over multiple neural network predictions such as Bayesian neural nets, ensembles, and Gaussian processes; methods on the frontier of scale in terms of their overall parameter or prediction-time efficiency; and methods which encourage key inductive biases such as data augmentation. We ground these ideas in both empirical understanding and theory, and we provide practical recommendations with baselines and tips & tricks. Finally, we highlight open challenges in the field.

In recent years machine learning research has been dominated by optimisation-based learning methods (take gradient descent, for example, which is ubiquitous in deep learning). However, while tools that operate under this paradigm have proven to be very powerful, they are often not well suited for tackling complex challenges such as highly non-stationary targets or explicit multi-agent systems. In an attempt to overcome such limitations, some researchers are instead turning towards open-ended methods, and considering how to design the underlying learning dynamics. This tutorial discusses how different tools can be applied to construct and combine adaptive objectives for populations of learners. We begin by providing background on the problem setting, basic tools and philosophy. In a second part we then dive into the basics of evolutionary computation. In particular, we frame the development of evolutionary methods as a focus shift away from gradient-free optimisers in search of more generic and powerful tools for designing learning dynamics. Finally, we provide a more detailed overview of techniques and research around training and evaluating populations of agents.

Virtually all deep learning is built upon the notion of explicit computation: layers of a network are written in terms of their explicit step-by-step computations used to map inputs to outputs. But a rising trend in deep learning takes a different approach: implicit layers, where one instead specifies the conditions for a layer’s output to satisfy. Such architectures date back to early work on recurrent networks but have recently gained a great deal of attention as the approach behind Neural ODEs, Deep Equilibrium Models (DEQs), FFJORD, optimization layers, SVAEs, implicit meta-learning, and many other approaches. These methods can have substantial conceptual, computational, and modeling benefits: they often make it much easier to specify simple-yet-powerful architectures, can vastly reduce the memory consumption of deep networks, and allow more natural modeling of e.g. continuous-time phenomena.

This tutorial will provide a unified perspective on implicit layers, illustrating how the implicit modeling framework encompasses all the models discussed above, and providing a practical view of how to integrate such approaches into modern deep learning systems. We will cover the history and motivation of implicit layers, discuss how to solve the resulting "forward" inference problem, and then highlight how to compute gradients through such layers …

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. Similarly, federated analytics (FA) allows data scientists to generate analytical insight from the combined information in distributed datasets without requiring data centralization. Federated approaches embody the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.

Motivated by the explosive growth in federated learning and analytics research, this tutorial will provide a gentle introduction to the area. The focus will be on cross-device federated learning, including deep dives on federated optimization and differentially privacy, but federated analytics and cross-silo federated learning will also be discussed. In addition to optimization and privacy, we will also introduce personalization, robustness, fairness, and systems challenges in the federated setting with an emphasis on open problems.

Website: (https://sites.google.com/view/fl-tutorial/home)

This tutorial will cover policy gradients methods in reinforcement learning, with a focus on understanding foundational ideas from an optimization perspective. We will discuss the properties of the policy objective, in terms of two critical properties for convergence rates when using stochastic gradient approaches: variance and curvature. We will explain how the policy objective can be a particularly difficult optimization problem, as it can have large flat regions and stochastic samples of the gradient can be very high variance. We will first explain how to use standard tools from optimization to reduce the variance of the gradient estimate, as well as techniques to mitigate curvature issues. We will then discuss optimization improvements that leverage more knowledge about the objective, including the Markov property and how to modify the state distribution for more coverage. We will discuss how standard Actor-Critic methods with (off-policy) data re-use provide RL-specific variance reduction approaches. We will then conclude with an overview of what is known theoretically about the policy objective, where we discuss the role of entropy-regularization and exploration for mitigating curvature issues. The tutorial website is here: Home (google.com)

Timetable: Nicolas - 40 minute presentation + 10 minute Q&A Martha - 40 minute presentation …