We’ve all been there. A creative spark leads to a beautiful idea. We love the idea, we nurture it, and name it. The idea is elegant: all who hear it fawn over it. The idea is justified: all of the literature we have read supports it. But, lo and behold: once we sit down to implement the idea, it doesn’t work. We check our code for software bugs. We rederive our derivations. We try again and still, it doesn’t work. We Can’t Believe It’s Not Better [1].
In this workshop, we will encourage probabilistic machine learning researchers who Can’t Believe It’s Not Better to share their beautiful idea, tell us why it should work, and hypothesize why it does not in practice. We also welcome work that highlights pathologies or unexpected behaviors in wellestablished practices. This workshop will stress the quality and thoroughness of the scientific procedure, promoting transparency, deeper understanding, and more principled science.
Focusing on the probabilistic machine learning community will facilitate this endeavor, not only by gathering experts that speak the same language, but also by exploiting the modularity of probabilistic framework. Probabilistic machine learning separates modeling assumptions, inference, and model checking into distinct phases [2]; this facilitates criticism when the final outcome does not meet prior expectations. We aim to create an openminded and diverse space for researchers to share unexpected or negative results and help one another improve their ideas.
Sat 4:45 a.m.  5:00 a.m.

Intro
(Welcome Intro)

Aaron Schein · Melanie F. Pradier 🔗 
Sat 5:00 a.m.  5:30 a.m.

Invited Talk: Max Welling  The LIAR (Learning with Interval Arithmetic Regularization) is Dead
(Talk)
Two years ago we embarked on a project called LIAR. LIAR was going to quantify uncertainty of a network through interval arithmetic (IA) calculations (which are an official IEEE standard). IA has the beautiful property that the answer of your computation is guaranteed to lie in a computed interval, and as such quantifies very precisely the numerical precision of your computation. Captured by this elegant idea we applied this to neural networks. In particular, the idea was to add a regularization term to the objective that would try to keep the interval of the network’s output small. This is particularly interesting in the context of quantization, where we quite naturally have intervals for the weights, activations and inputs due to their limited precision. By training a full precision neural network with intervals that represent the quantization error, and by encouraging the network to keep the resultant variation in the predictions small, we hoped to learn networks that were inherently robust to quantization noise. So far the good news. In this talk I will try to reconstruct the process of how the project ended up on the scrap pile. I will also try to produce some “lessons learned” from this project and hopefully deliver some advice for those who are going through a similar situation. I still can’t believe it didn’t work better ;) 
Max Welling 🔗 
Sat 5:30 a.m.  6:00 a.m.

Invited Talk: Danielle Belgrave  Machine Learning for Personalised Healthcare: Why is it not better?
(Talk)
SlidesLive Video » This talk presents an overview of probabilistic graphical modelling as a strategy for understanding heterogeneous subgroups of patients. The identification of such subgroups may elucidate underlying causal mechanisms which may lead to more targeted treatment and intervention strategies. We will look at (1) the ideal of personalisation within the context of machine learning for healthcare (2) “From the ideal to the reality” and (3) some of the possible pathways to progress for making the ideal of personalised healthcare to reality. The last part of this talk focuses on the pipeline of personalisation and looks at probabilistic graphical models are part of a pipeline. 
Danielle Belgrave 🔗 
Sat 6:00 a.m.  6:30 a.m.

Invited Talk: Mike Hughes  The Case for Prediction Constrained Training
(Talk)
SlidesLive Video » This talk considers adding supervision to wellknown generative latent variable models (LVMs), including both classic LVMs (e.g. mixture models, topic models) and more recent “deep” flavors (e.g. variational autoencoders). The standard way to add supervision to LVMs would be to treat the added label as another observed variable generated by the graphical model, and then maximize the joint likelihood of both labels and features. We find that across many models, this standard supervision leads to surprisingly negligible improvement in prediction quality over a more naive baseline that first fits an unsupervised model, and then makes predictions given that model’s learned lowdimensional representation. We can’t believe it is not better! Further, this problem is not properly solved by previous approaches that just upweight or “replicate” labels in the generative model (the problem is not just that we have more observed features than labels). Instead, we suggest the problem is related to model misspecification, and that the joint likelihood objective does not properly encode the desired performance goals at test time (we care about predicting labels from features, but not features from labels). This motivates a new training objective we call prediction constrained training, which can prioritize the labelfromfeature prediction task while still delivering reasonable generative models for the observed features. We highlight promising results of our proposed predictionconstrained framework including recent extensions to semisupervised VAEs and modelbased reinforcement learning. 
Michael Hughes 🔗 
Sat 6:30 a.m.  6:33 a.m.

Margot SelosseA bumpy journey: exploring deep Gaussian mixture models
(Spotlight Talk)
SlidesLive Video » The deep Gaussian mixture model (DGMM) is a framework directly inspired by the finite mixture of factor analysers model (MFA) and the deep learning architecture composed of multiple layers. The MFA is a generative model that considers a data point as arising from a latent variable (termed the score) which is sampled from a standard multivariate Gaussian distribution and then transformed linearly. The linear transformation matrix (termed the loading matrix) is specific to a component in the finite mixture. The DGMM consists of stacking MFA layers, in the sense that the latent scores are no longer assumed to be drawn from a standard Gaussian, but rather are drawn from a mixture of factor analysers model. Thus the latent scores are at one point considered to be the input of an MFA and also to have latent scores themselves. The latent scores of the DGMM's last layer only are considered to be drawn from a standard multivariate Gaussian distribution. In recent years, the DGMM gained prominence in the literature: intuitively, this model should be able to capture distributions more precisely than a simple Gaussian mixture model. We show in this work that while the DGMM is an original and novel idea, in certain cases it is challenging to infer its parameters. In addition, we give some insights to the probable reasons of this difficulty. Experimental results are provided on github: https://github.com/ansubmissions/ICBINB, alongside an R package that implements the algorithm and a number of readytorun R scripts. 
Margot Selosse 🔗 
Sat 6:33 a.m.  6:36 a.m.

Diana CaiPower posteriors do not reliably learn the number of components in a finite mixture
(Spotlight Talk)
SlidesLive Video » Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. Data science folk wisdom tells us that a finite mixture model (FMM) with a prior on the number of components will fail to recover the true, datagenerating number of components under model misspecification. But practitioners still widely use FMMs to learn the number of components, and statistical machine learning papers can be found recommending such an approach. Increasingly, though, data science papers suggest potential alternatives beyond vanilla FMMs, such as power posteriors, coarsening, and related methods. In this work we start by adding rigor to folk wisdom and proving that, under even the slightest model misspecification, the FMM componentcount posterior diverges: the posterior probability of any particular finite number of latent components converges to 0 in the limit of infinite data. We use the same theoretical techniques to show that power posteriors with fixed power face the same undesirable divergence, and we provide a proof for the case where the power converges to a nonzero constant. We illustrate the practical consequences of our theory on simulated and real data. We conjecture how our methods may be applied to lend insight into other componentcount robustification techniques. 
Diana Cai 🔗 
Sat 6:36 a.m.  6:39 a.m.

W Ronny HuangUnderstanding Generalization through Visualizations
(Spotlight Talk)
SlidesLive Video » The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well. 
W. Ronny Huang 🔗 
Sat 6:39 a.m.  6:42 a.m.

Udari MadhushaniIt Doesn’t Get Better and Here’s Why: A Fundamental Drawback in Natural Extensions of UCB to Multiagent Bandits
(Spotlight Talk)
SlidesLive Video » We identify a fundamental drawback of natural extensions of Upper Confidence Bound (UCB) algorithms to the multiagent bandit problem in which multiple agents facing the same exploreexploit problem can share information. We provide theoretical guarantees that when agents use a natural extension of the UCB sampling rule, sharing information about the optimal option degrades their performance. For K the number of agents and T the time horizon, we prove that when agents share information only about the optimal option they suffer an expected group cumulative regret of O(KlogT + KlogK), whereas when they do not share any information they only suffer a group regret of O(KlogtT). Further, while information sharing about all options yields much better performance than with no information sharing, we show that including information about the optimal option is not as good as sharing information only about suboptimal options. 
Udari Madhushani 🔗 
Sat 6:42 a.m.  6:45 a.m.

Erik JonesSelective Classification Can Magnify Disparities Across Groups
(Spotlight Talk)
SlidesLive Video » Selective classification, in which models are allowed to abstain on uncertain predictions, is a natural approach to improving accuracy in settings where errors are costly but abstentions are manageable. In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations. We observe this behavior consistently across five datasets from computer vision and NLP. Surprisingly, increasing the abstention rate can even decrease accuracies on some groups. To better understand when selective classification improves or worsens accuracy on a group, we study its margin distribution, which captures the model’s confidences over all predictions. For example, when the margin distribution is symmetric, we prove that whether selective classification monotonically improves or worsens accuracy is fully determined by the accuracy at full coverage (i.e., without any abstentions) and whether the distribution satisfies a property we term leftlogconcavity. Our analysis also shows that selective classification tends to magnify accuracy disparities that are present at full coverage. Fortunately, we find that it uniformly improves each group when applied to distributionallyrobust models that achieve similar fullcoverage accuracies across groups. Altogether, our results imply selective classification should be used with care and underscore the importance of models that perform equally well across groups at full coverage. 
Erik Jones 🔗 
Sat 6:45 a.m.  6:48 a.m.

Yannick RudolphGraph Conditional Variational Models: Too Complex for Multiagent Trajectories?
(Spotlight Talk)
SlidesLive Video » Recent advances in modeling multiagent trajectories combine graph architectures such as graph neural networks (GNNs) with conditional variational models (CVMs) such as variational RNNs (VRNNs). Originally, CVMs have been proposed to facilitate learning with multimodal and structured data and thus seem to perfectly match the requirements of multimodal multiagent trajectories with their structured output spaces. Empirical results of VRNNs on trajectory data support this assumption. In this paper, we revisit experiments and proposed architectures with additional rigour, ablation runs and baselines. In contrast to common belief, we show that both, historic and current results with CVMs on trajectory data are misleading. Given a neural network with a graph architecture and/or structured output function, variational autoencoding does not contribute statistically significantly to empirical performance. Instead, we show that wellknown emission functions do contribute, while coming with less complexity, engineering and computation time. 
Yannick Rudolph 🔗 
Sat 6:50 a.m.  7:00 a.m.

Coffee Break (Gather.town available: https://bit.ly/3gxkLA7)
(Coffee Break)

🔗 
Sat 7:00 a.m.  8:00 a.m.

Poster Session in gather.town: https://bit.ly/3gxkLA7
(Poster Session)
link »
Link to access the gather town: https://bit.ly/3gxkLA7 
🔗 
Sat 8:00 a.m.  8:15 a.m.

Charline Le LanPerfect density models cannot guarantee anomaly detection
(Contributed Talk)
SlidesLive Video » Thanks to the tractability of their likelihood, some deep generative models show promise for seemingly straightforward but important applications like anomaly detection, uncertainty estimation, and active learning. However, the likelihood values empirically attributed to anomalies conflict with the expectations these proposed applications suggest. In this paper, we take a closer look at the behavior of distribution densities and show that these quantities carry less meaningful information than previously thought, beyond estimation issues or the curse of dimensionality. We conclude that the use of these likelihoods for outofdistribution detection relies on strong and implicit hypotheses and highlight the necessity of explicitly formulating these assumptions for reliable anomaly detection. 
Charline Le Lan 🔗 
Sat 8:15 a.m.  8:30 a.m.

Fan BaoVariational (Gradient) Estimate of the Score Function in Energybased Latent Variable Models
(Contributed Talk)
SlidesLive Video » The learning and evaluation of energybased latent variable models (EBLVMs) without any structural assumptions are highly challenging, because the true posteriors and the partition functions in such models are generally intractable. This paper presents variational estimates of the score function and its gradient with respect to the model parameters in a general EBLVM, referred to as VaES and VaGES respectively. The variational posterior is trained to minimize a certain divergence to the true model posterior and the bias in both estimates can be bounded by the divergence theoretically. With a minimal model assumption, VaES and VaGES can be applied to the kernelized Stein discrepancy (KSD) and score matching (SM)based methods to learn EBLVMs. Besides, VaES can also be used to estimate the exact Fisher divergence between the data and general EBLVMs. 
Fan Bao 🔗 
Sat 8:30 a.m.  8:45 a.m.

Emilio JorgeInferential Induction: A Novel Framework for Bayesian Reinforcement Learning
(Contributed Talk)
SlidesLive Video » Bayesian Reinforcement Learning (BRL) offers a decisiontheoretic solution to the reinforcement learning problem. While ''modelbased'' BRL algorithms have focused either on maintaining a posterior distribution on models, BRL ''modelfree'' methods try to estimate value function distributions but make strong implicit assumptions or approximations. We describe a novel Bayesian framework, \emph{inferential induction}, for correctly inferring value function distributions from data, which leads to a new family of BRL algorithms. We design an algorithm, Bayesian Backwards Induction (BBI), with this framework. We experimentally demonstrate that BBI is competitive with the state of the art. However, its advantage relative to existing BRL modelfree methods is not as great as we have expected, particularly when the additional computational burden is taken into account. 
Emilio Jorge 🔗 
Sat 9:00 a.m.  10:00 a.m.

Lunch Break (Gather.town available: https://bit.ly/3gxkLA7)
(Lunch)

🔗 
Sat 10:00 a.m.  10:30 a.m.

Invited Talk: Andrew Gelman  It Doesn’t Work, But The Alternative Is Even Worse: Living With Approximate Computation
(Talk)
We can’t fit the models we want to fit because it takes too long to fit them on our computer. Also, we don’t know what models we want to fit until we try a few. I share some stories of struggles with datapartitioning and parameterpartitioning algorithms, what kinda worked and what didn’t. 
Andrew Gelman 🔗 
Sat 10:30 a.m.  11:00 a.m.

Invited Talk: Roger Grosse  Why Isn’t Everyone Using SecondOrder Optimization?
(Talk)
In the preAlexNet days of deep learning, secondorder optimization gave dramatic speedups and enabled training of deep architectures that seemed to be inaccessible to firstorder optimization. But today, despite algorithmic advances such as KFAC, nearly all modern neural net architectures are trained with variants of SGD and Adam. What’s holding us back from using secondorder optimization? I’ll discuss three challenges to applying secondorder optimization to modern neural nets: difficulty of implementation, implicit regularization effects of gradient descent, and the effect of gradient noise. All of these factors are significant, though not in the ways commonly believed. 
Roger Grosse 🔗 
Sat 11:00 a.m.  11:30 a.m.

Invited Talk: Weiwei Pan  What are Useful Uncertainties for Deep Learning and How Do We Get Them?
(Talk)
While deep learning has demonstrable success on many tasks, the point estimates provided by standard deep models can lead to overfitting and provide no uncertainty quantification on predictions. However, when models are applied to critical domains such as autonomous driving, precision health care, or criminal justice, reliable measurements of a model’s predictive uncertainty may be as crucial as correctness of its predictions. In this talk, we examine a number of deep (Bayesian) models that promise to capture complex forms for predictive uncertainties, we also examine metrics commonly used to such uncertainties. We aim to highlight strengths and limitations of these models as well as the metrics; we also discuss ideas to improve both in meaningful ways for downstream tasks. 
Weiwei Pan 🔗 
Sat 11:30 a.m.  11:33 a.m.

Vincent FortuinBayesian Neural Network Priors Revisited
(Spotlight Talk)
SlidesLive Video » Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, there has been recent controversy over the question whether they might be to blame for the undesirable cold posterior effect. We study this question empirically and find that for densely connected networks, Gaussian priors are indeed less well suited than more heavytailed ones. Conversely, for convolutional architectures, Gaussian priors seem to perform well and thus cannot fully explain the cold posterior effect. These findings coincide with the empirical maximumlikelihood weight distributions discovered by standard gradientbased training. 
Vincent Fortuin 🔗 
Sat 11:33 a.m.  11:36 a.m.

Ziyu WangFurther Analysis of Outlier Detection with Deep Generative Models
(Spotlight Talk)
SlidesLive Video » The recent, counterintuitive discovery that deep generative models (DGMs) can frequently assign a higher likelihood to outliers has implications for both outlier detection applications as well as our overall understanding of generative modeling. In this work, we present a possible explanation for this phenomenon, starting from the observation that a model's typical set and highdensity region may not coincide. From this vantage point we propose a novel outlier test, the empirical success of which suggests that the failure of existing likelihoodbased outlier tests does not necessarily imply that the corresponding generative model is uncalibrated. We also conduct additional experiments to help disentangle the impact of lowlevel texture versus highlevel semantics in differentiating outliers. In aggregate, these results suggest that modifications to the standard evaluation practices and benchmarks commonly applied in the literature are needed. 
Ziyu Wang 🔗 
Sat 11:36 a.m.  11:39 a.m.

Siwen YanThe Curious Case of Stacking Boosted Relational Dependency Networks
(Spotlight Talk)
SlidesLive Video » Reducing bias while learning and inference is an important requirement to achieve generalizable and better performing models. The method of stacking took the first step towards creating such models by reducing inference bias but the question of combining stacking with a model that reduces learning bias is still largely unanswered. In statistical relational learning, ensemble models of relational trees such as boosted relational dependency networks (RDNBoost) are shown to reduce the learning bias. We combine RDNBoost and stacking methods with the aim of reducing both learning and inference bias subsequently resulting in better overall performance. However, our evaluation on three relational data sets shows no significant performance improvement over the baseline models. 
Siwen Yan 🔗 
Sat 11:39 a.m.  11:42 a.m.

Maurice Frank  Problems using deep generative models for probabilistic audio source separation
(Spotlight Talk)
SlidesLive Video » Recent advancements in deep generative modeling make it possible to learn prior distributions from complex data that subsequently can be used for Bayesian inference. However, we find that distributions learned by deep generative models for audio signals do not exhibit the right properties that are necessary for tasks like audio source separation using a probabilistic approach. We observe that the learned prior distributions are either discriminative and extremely peaked or smooth and nondiscriminative. We quantify this behavior for two types of deep generative models on two audio datasets. 
Maurice Frank 🔗 
Sat 11:42 a.m.  11:45 a.m.

Ramiro CaminoOversampling Tabular Data with Deep Generative Models: Is it worth the effort?
(Spotlight Talk)
SlidesLive Video » In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A common method to treat imbalanced datasets is under and oversampling. In this process, samples are either removed from the majority class or synthetic samples are added to the minority class. In this paper, we follow up on recent developments in deep learning. We take proposals of deep generative models, and study the ability of these approaches to provide realistic samples that improve performance on imbalanced classification tasks via oversampling. Across 160K+ experiments, we show that the improvements in terms of performance metric, while shown to be significant when ranking the methods like in the literature, often are minor in absolute terms, especially compared to the required effort. Furthermore, we notice that a large part of the improvement is due to undersampling, not oversampling. 
Ramiro Camino 🔗 
Sat 11:45 a.m.  11:48 a.m.

Ângelo Gregório LovattoDecisionAware Model Learning for ActorCritic Methods: When Theory Does Not Meet Practice
(Spotlight Talk)
SlidesLive Video » ActorCritic methods are a prominent class of modern reinforcement learning algorithms based on the classic Policy Iteration procedure. Despite many successful cases, ActorCritic methods tend to require a gigantic number of experiences and can be very unstable. Recent approaches have advocated learning and using a world model to improve sample efficiency and reduce reliance on the value function estimate. However, learning an accurate dynamics model of the world remains challenging, often requiring computationally costly and datahungry models. More recent work has shown that learning an everywhere accurate model is unnecessary and often detrimental to the overall task; instead, the agent should improve the world model on taskcritical regions. For example, in Iterative ValueAware Model Learning, the authors extend modelbased value iteration by incorporating the value function (estimate) into the model loss function, showing the novel model objective reflects improved performance in the end task. Therefore, it seems natural to expect that modelbased ActorCritic methods can benefit equally from learning valueaware models, improving overall task performance, or reducing the need for large, expensive models. However, we show empirically that combining ActorCritic and valueaware model learning can be quite difficult and that naive approaches such as maximum likelihood estimation often achieve superior performance with less computational cost. Our results suggest that, despite theoretical guarantees, learning a valueaware model in continuous domains does not ensure better performance on the overall task. 
Ângelo Lovatto 🔗 
Sat 11:50 a.m.  12:00 p.m.

Coffee Break (Gather.town available: https://bit.ly/3gxkLA7)
(Break)

🔗 
Sat 12:00 p.m.  12:15 p.m.

Tin D. NguyenIndependent versus truncated finite approximations for Bayesian nonparametric inference
(Contributed Talk)
SlidesLive Video » Bayesian nonparametric models based on completely random measures (CRMs) offers flexibility when the number of clusters or latent components in a data set is unknown. However, managing the infinite dimensionality of CRMs often leads to slow computation during inference. Practical inference typically relies on either integrating out the infinitedimensional parameter or using a finite approximation: a truncated finite approximation (TFA) or an independent finite approximation (IFA). The atom weights of TFAs are constructed sequentially, while the atoms of IFAs are independent, which facilitates more convenient inference schemes. While the approximation error of TFA has been systematically addressed, there has not yet been a similar study of IFA. We quantify the approximation error between IFAs and two common target nonparametric priors (betaBernoulli process and Dirichlet process mixture model) and prove that, in the worstcase, TFAs provide more componentefficient approximations than IFAs. However, in experiments on image denoising and topic modeling tasks with real data, we find that the error of Bayesian approximation methods overwhelms any finite approximation error, and IFAs perform very similarly to TFAs. 
Tin Nguyen 🔗 
Sat 12:15 p.m.  12:30 p.m.

Ricky T. Q. ChenSelfTuning Stochastic Optimization with CurvatureAware Gradient Filtering
(Contributed Talk)
SlidesLive Video » Standard firstorder stochastic optimization algorithms base their updates solely on the average minibatch gradient, and it has been shown that tracking additional quantities such as the curvature can help desensitize common hyperparameters. Based on this intuition, we explore the use of exact persample Hessianvector products and gradients to construct optimizers that are selftuning and hyperparameterfree. Based on a dynamics model of the gradient, we derive a process which leads to a curvaturecorrected, noiseadaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our modelbased procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of welltuned optimizers and ultimately, this is an interesting step for constructing selftuning optimizers. 
Tian Qi Chen 🔗 
Sat 12:30 p.m.  12:45 p.m.

Elliott GordonRodriguezUses and Abuses of the CrossEntropy Loss: Case Studies in Modern Deep Learning
(Contributed Talk)
SlidesLive Video » Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical crossentropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actormimic reinforcement learning, amongst others. Drawing on the recently discovered {continuouscategorical} distribution, we propose probabilisticallyinspired alternatives to these models, providing an approach that is a more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof. 
Elliott GordonRodriguez 🔗 
Sat 12:45 p.m.  1:45 p.m.

Poster Session (in gather.town): https://bit.ly/3gxkLA7
(Poster Session (in gather.town))
Link to access the Gather.town: https://neurips.gather.town/app/5163xhrHdSWrUZsG/ICBINB 
🔗 
Sat 1:15 p.m.  1:45 p.m.

Breakout Discussions (in gather.town): https://bit.ly/3gxkLA7
Link to access the Gather.town: https://neurips.gather.town/app/5163xhrHdSWrUZsG/ICBINB 
🔗 
Sat 1:45 p.m.  2:45 p.m.

Panel & Closing
(Panel)
A panel discussion moderated by Hanna Wallach (MSR New York). Panelists:  Tamera Broderick (MIT)  Laurent Dinh (Google Brain)  Neil Lawrence (Cambridge)  Kristian Lum (Human Rights Data Analysis Group)  Sinead Williamson (UT Austin) 
Tamara Broderick · Laurent Dinh · Neil Lawrence · Kristian Lum · Hanna Wallach · Sinead Williamson 🔗 