NeurIPS 2020 Demonstrations Track


NeurIPS 2020 Accepted demonstrations


Below you will find a list of the accepted demonstrations for NeurIPS2020.  Please note that all information is subject to change.


MONICA: MObile Neural voIce Command Assistant for mobile games

Authors: Youshin Lim, Yoonseok Hong, Shounan An, Jaegeon Jo, Hanook Lee, Suhyeon Jeong, Yoohyun Eum, Sunwoo Im, Insoo Oh

Recently deep learning based on-device automatic speech recognition (ASR) shows breakthrough progress. However, in literature, there is no concrete work about integrating on-device ASR into mobile games as a voice user interface. The difficulties to deploy ASR into mobile games is that most game users want a quick responding voice command interface with no time delay. Therefore a need to design an on-device ASR system which costs minimal memory and CPU resources rises. To this end, we propose transformer based on-device ASR named MObile Neural voIce Command Assistant (MONICA) for mobile games. With MONICA, users could conduct game actions using voice commands only, such as "enter the monster dungeon", "start the auto-quest", "open the inventory" etc.To the best of our knowledge, this is the first work trying to resolve an on-device ASR task for mobile games at the service level. MONICA reduces the number of parameters in the neural network to 10% and speeds up the inference time by more than 5 times compared to the baseline transformer model while retaining minimal recognition accuracy degradation. We perform a web-based interactive live demonstration of MONICA as a voice user interface for an online chess game. Also, a demonstration video shows MONICA integrated into A3: Still Alive, which is a major game from Netmarble serviced in South Korea. MONICA will be on the service as a voice command interface for all A3 users very soon. Finally, we release a mobile application so that you could download and test the efficiency of MONICA on your mobile device.

tspDB: Time Series Predict DB

Authors: Anish Agarwal, Abdullah Alomar, Devavrat Shah

An important goal in Systems for ML is to make ML broadly accessible. Arguably, the major bottleneck is not the lack of access to prediction algorithms, for which many excellent open-source ML libraries exist. Rather, it is the complex data engineering required to take data from a datastore or database (DB) into a particular work environment format (e.g. spark data-frame) so that a prediction algorithm can be trained, and to do so in a scalable manner. This is further exacerbated as ML algorithms are now trained on large volumes of data, yet we need predictions in real-time. This is especially true in a variety of time-series applications such as finance and real-time control systems.

Towards easing this bottleneck, we showcase tspDB – a system that enables predictive query functionality in any existing time-series relational DB (open-source available at Specifically, tspDB enables two types of predictive queries for time series data: (i) imputing a missing/noisy observation for a data point we do observe; (ii) forecasting a data point in the future. In tspDB the ML workflow is entirely abstracted away from the user; instead a single interface to answer both a predictive query and a standard SQL SELECT query is exposed. Pleasingly, we find tspDB statistically outperforms industry standard deep-learning based time series methods (e.g. DeepAR, LSTM’s) on benchmark time series datasets; further, tspDB’s computational performance is close to the time it takes to just insert and read data from PostgreSQL, making it a real-time prediction system.

The demo itself will be run entirely through a Google Colab notebook that users can access through a browser and will require no software installation. The notebook will walk through how to use tspDB to make predictive SQL queries on retail, energy and financial data, and how to measure its computational performance with respect to standard SQL queries. A pre-recording of the entire demo will also be provided.

Probing Embedding Spaces in Deep Neural Networks

Authors: Junior Rojas, Bilal Alsallakh, Edward Wang, Sara Zhang, Jonathan Reynolds, Narine Kokhlikyan, Vivek Miglani, Carlos Araya, Tony Chu, Orion Reblitz-Richardson

We demonstrate an interactive UI to explore neural embedding spaces by probing directions in these spaces, determined by component analysis such as PCA and ICA. It provides (1) Fluid overview+detail exploration of these directions with multi-modal viewers to inspect individual samples including images, audio clips, words, and sample attributes, (2) A dedicated view to analyze the development of embedding spaces over multiple layers, and (3) A dedicated view to compare the embedding spaces across different models.

IBM Federated Learning Community Edition: An Interactive Demonstration

Authors: Laura Wynter, Chaitanya Kumar, Pengqian Yu, Mikhail Yurochkin, Amogh Tarcar

Federated Learning (FL) is a means to train machine learning models without centralizing data. To deal with the ever-growing demands for training data whilst respecting data privacy and confidentiality, it has become important to move from centralized to federated machine learning. The IBM Federated Learning Community Edition is one means for achieving this goal; it is a platform and library, free to use for non-commercial purposes, with built-in features that facilitate enterprise-strength applications: \url{}. This interactive demo session highlights several featured algorithms available only in the IBM Federated Learning Community Edition, and provides tutorials, audience-interactive examples, and a guest speaker from the tech company Persistent Systems who has used the IBM Federated Learning Community Edition for Covid-19 outcome prediction for hospitals.

MolDesigner: Interactive Design of Efficacious Drugs with Deep Learning

Authors: Kexin Huang, Tianfan Fu, Dawood Khan, Ali Abid, Ali Abdalla, Abubaker Abid, Lucas Glass, Marinka Zitnik, Cao Xiao, Jimeng Sun 

The efficacy of a drug depends on its binding affinity to the therapeutic target and pharmacokinetics. Deep learning (DL) has demonstrated remarkable progress in predicting drug efficacy. We develop MolDesigner, a human-in-the-loop web user-interface (UI), to assist drug developers leverage DL predictions to design more effective drugs. A developer can draw a drug molecule in the interface. In the backend, more than 17 state-of-the-art DL models generate predictions on important indices that are crucial for a drug's efficacy. Based on these predictions, drug developers can edit the drug molecule and reiterate until satisfaction. MolDesigner can make predictions in real-time with a latency of less than a second.

MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval

Authors: Mark Hamilton, Stephanie Fu, Mindren Lu, Johnny Bui, Margaret Wang, Felix Tran, Marina Rogers, Darius Bopp, Chris Hoder, Lei Zhang, William Freeman

We introduce MosAIc, an interactive website that allows users to discover hidden connections between works of art across culture, media, artists, and time. MosAIc finds ``visual analogies'', or works of art with the same semantic structure but very different cultural and artistic context, within the combined works of the Metropolitan Museum of Art and the Rijksmuseum. Users can take any work from the collection and find analogous works in particular genres, cultures, or media of art. Our approach finds visual analogies that mirror larger scale cultural trends, such as the flows of artistic techniques across the globe due to trade routes. Our approach is based on generalizing deep image retrieval methods to flexibly adapt to logical filters and predicates. This allows image retrieval methods to find close matches in different regions of the image collection, an approach we call ``Conditional Image Retrieval''.

RetaiL: Open your own grocery store to reduce waste

Authors: Sami Jullien, Sebastian Schelter, Maarten de Rijke 

Food waste is a major societal, environmental, and financial problem. One of the main actors are grocery stores. Policies for reducing food waste in those are complex due to a large number of uncertain heterogeneous factors like non-fully predictable demand. Directly comparing food waste reduction policies through field experimentation is contrary to the very target of food waste reduction.

This is why we propose RetaiL, a new simulation framework, to optimise grocery store restocking for waste reduction. RetaiL offers its users the possibility to create synthetic product data, based on real data from a European retailer. It then matches simulated customer demand to a restocking policy for those items, and evaluates a utility function based on generated waste, item availability to customers and sales. This allows RetaiL to function as a new Reinforcement Learning Task, where the agent has to act on restocking level given the state of the store, and receives this utility function as a reward.

In this demo, we let you open your own grocery store and manage its orders to the warehouse. Can you help in the fight against food waste?

PrototypeML: Visual Design of Arbitrarily Complex Neural Networks

Author: Daniel Harris

Neural network architectures are most often conceptually designed and described in visual terms, but are implemented by writing error-prone code. PrototypeML is a neural network development environment that bridges the dichotomy between the design and development processes: it provides a highly intuitive visual neural network design interface that supports (yet abstracts) the full dynamic graph capabilities of the PyTorch deep learning framework, reduces model design and development time, makes debugging easier, and automates many framework and code writing idiosyncrasies. Through a hybrid code and visual approach, PrototypeML resolves deep learning development deficiencies without limiting network expressiveness or reducing code quality, and provides real-world benefits for research, industry and teaching.

Join us for a live overview (and Q&A) of the PrototypeML platform during the conference, and explore the on-demand interactive platform demonstration:

A Knowledge Graph Reasoning Prototype

Authors: Lihui Liu, Boxin Du, Heng Ji, Hanghang Tong

Reasoning is a fundamental capability for distilling valuable information from knowledge graphs. Existing work has primarily been focusing on point-wise reasoning, including search, link predication, entity prediction, subgraph matching and so on. We introduce comparative reasoning over knowledge graphs, which aims to infer the commonality and inconsistency with respect to multiple pieces of clues.

We develop a large-scale prototype system that integrates various point-wise reasoning functions as well as the newly proposed comparative reasoning capability over knowledge graphs. We present both the system architecture and its key functions.

Shared Interest: Human Annotations vs. AI Saliency

Authors: Angie Boggust, Benjamin Hoover, Arvind Satyanarayan, Hendrik Strobelt

As deep learning is applied to high stakes scenarios, it is increasingly important that a model is not only making accurate decisions, but doing so for the right reasons. Common explainability methods provide pixel attributions as an explanation for a model's decision on a single image; however, using input-level explanations to understand patterns in model behavior is challenging for large datasets as it requires selecting and analyzing an interesting subset of inputs. Utilizing human generated ground truth object locations, we introduce metrics for ranking inputs based on the correspondence between the input’s ground truth location and the explainability method’s explanation region. Our methodology is agnostic to model architecture, explanation method, and dataset allowing it to be applied to many tasks. We demo our method on two high profile scenarios: a widely used image classification model and a melanoma prediction model, showing it surfaces patterns in model behavior by aligning model explanations with human annotations.

LMdiff: A Visual Diff Tool to Compare LanguageModels

Authors: Hendrik Strobelt, Benjamin Hoover, Arvind Satyanarayan, Sebastian Gehrmann

Recently, large language models (LM) have been shown to sample mostly coherent long-form text. This astonishing level of fluency has driven an increasing interest to understand how these models work and, in particular, how to interpret and evaluate them. Additionally, the growing use of sophisticated LM frameworks has lowered the threshold for users to train newmodels or to fine-tune existing models for transfer learning. However, selecting the best LM from the expanding selection of pre-trained deep LM architectures is challenging, as there are few tools available to qualitatively compare models for specialized use-cases, e.g. to answer questions like: "What parts of a domain specific text can the fine-tuned model capture better than the general model?"

We introduce LMdiff: an interactive visual analysis tool for comparing LMs by qualitatively inspecting concrete samples generated by another model or drawn from a reference corpus. We provide an offline method to search for interesting samples, a live demo, and source code for the demo session that supports multiple models and allows users to upload their own example text.

AI Assisted Data Labeling

Authors: Michael Desmond, Evelyn Duesterwald, Krissy Brimijoin, Michael Muller, Aabhas Sharma, Narendra Nath Joshi, Qian Pan, Casey Dugan, Zahra Ashktorab, Michelle Brachman

Human-in-the-loop data labeling is generally considered a tedious, error-prone and expensive activity. Automation of the labeling task is desirable, but current approaches can conflict with principles of trust and human agency. We are developing a data labeling experience where the human labeler transparently interacts with an AI assistant to reach automation readiness, at which point the remainder of the labeling task can be delegated to a virtual assistant. Our approach combines semi-supervised learning, active learning, and human-machine decision tracking to reduce labeling effort and support reliable automation. The demo takes participant through an online end-to-end AI assisted data labeling experience, starting with manual labeling, then assisted labeling and ultimately transitioning to automated labeling via a system of readiness checkpoints.

Automated dataset extraction from SEC filings

Authors: Rohit Dube, Rohit Khandekar, Ishaq Hult 

Automated extraction and analysis of key information from unstructured documents is a central problem in information retrieval. Businesses are often inundated with large volumes of documents like financial statements, contracts and agreements, invoices and customer lists, which are generally meant for human comprehension and consumption, and hence automation becomes non-trivial.

Currently, the information from such documents is extracted by some combination of manual work and proprietary scripts that break often as something changes, leading to low efficiency, high labor cost, and inconsistencies in the output. Investment banks, fund managers, marketing agencies, and investors spend millions to either buy the data or outsource the whole process, while the data is available publicly for free.

We describe a capability for automated extraction and real-time analysis of datasets from a large corpus of documents containing running text and tables. Current version of our product works with millions of HTML documents from Securities and Exchange Commission (SEC) filings. These filings contain mandatory disclosures like financial information, executive compensation, mergers and acquisitions and key management changes from US corporations.

Our algorithm extracts information from millions of documents, normalizes and stores it in an efficient queryable format, interprets input queries and looks up relevant documents to compose an answer.

Generating Novelty in Open-World Multi-Agent Strategic Board Games

Authors: Shilpa Thomas, Mayank Kejriwal

We propose a demonstration of GNOME (Generating Novelty in Open-world Multi-agent Environments), an experimental platform that is designed to test the effectiveness of multi-agent AI systems when faced with \emph{novelty}. GNOME separates the development of AI gameplaying agents with the simulator, allowing \emph{unanticipated} novelty (in essence, novelty that is not subject to model-selection bias). Through the demonstration, we also hope to foster an open discussion on AI robustness and the nature of novelty in real-world environments. GNOME will employ a creative audience-interaction methodology well-suited to a virtual conference, as we will expose the facilities of the simulator (including live simulation) through a Web GUI.

Fast and Automatic Visual Label Conflict Resolution

Authors: Narendra Nath Joshi, Aabhas Sharma, Michelle Brachman, Qian Pan, Michael Muller, Michael Desmond, Krissy Brimijoin, Zahra Ashktorab, Evelyn Duesterwald, Casey Dugan

Even with the rise of unsupervised learning and weak supervision techniques, human-labeled data is still a necessary part of machine learning pipelines in many real-world contexts and applications. This often involves using crowdworkers for the laborious task of labeling large amounts of data. This is a largely asynchronous process and can lead to conflict among the workers, where individual labelers potentially submit labels in disagreement from each other for a given data item. When such noisy data is fed to a machine learning model, the accuracy and performance (on test data) of the overall system can suffer. One popular workaround is to entirely discard the data items with conflict. This however, leads to wastage of expensive, human-supplied data. Moreover, the data points with conflicting labels often are the data points which are crucial in determining the decision boundaries for the model itself. Another possibility is to automate conflict resolution. Here however, given humans themselves are in disagreement, state-of-the-art models can not be expected to reliably solve the problem. In practice therefore, it becomes imperative for a human to step in and resolve the conflict. Given conflict resolution is a non-trivial task, assistance of expensive subject matter experts (SMEs) is required. To help manage the SME’s time more efficiently, we propose an intelligent approach to resolve label conflicts by automatically re-ranking the conflicts in such an order that the conflicts with the most missing information useful to the model are displayed first, complete with ML assistance to auto-resolve easy conflicts, and explanations for justifying decisions and improving explainability.

DeepRacing AI - Autonomous Motorsport Racing

Authors: Trent Weiss, Madhur Behl

We propose a demonstration of our novel DeepRacing framework at NeurIPS 2020 as a platform for training and evaluating high-speed autonomous race cars.
DeepRacing uses the immensely popular and photo-realistic Formula One racing game and converts it into a simulation environment for autonomous racing.

We will demo both the ability to autonomously race the F1 car in the game using control inputs predicted by machine learned driving policies as well as tag images of the driver's point-of-view with various state information (such as the position, velocity, and control values for the racing agents) to enable generation of labelled datasets for supervised machine learning. We will demonstrate this technology in a real-time web broadcast with interactive inputs from the NeurIPS audience.

ColliFlow: A Library for Executing Collaborative Intelligence Graphs

Authors: Mateen Ulhaq, Ivan Bajic

Collaborative intelligence is a technique for using more than one computing device to perform a computational task. A possible application of this technique is to assist mobile client edge devices in performing inference of deep learning models by sharing the workload with a server. In one typical setup, the mobile device performs a partial inference of the model, up to an intermediate layer. The output tensor of this intermediate layer is then transmitted over a network (e.g. WiFi, LTE, 5G) to a server, which completes the remaining inference, and then transmits the result back to the client. Such a strategy can reduce network usage, resulting in reduced bandwidth costs, lower energy consumption, faster inference, and provide better privacy guarantees. A working implementation of this was shown in our demo at NeurIPS 2019. This year, we present a library that will enable researchers and developers to create collaborative intelligence systems themselves quickly and easily.

This demo presents a new library for developing and deploying collaborative intelligence systems. Computational and communication subprocesses are expressed as a directed acyclic graph. Expressing the entire process as a computational graph provides several advantages including modularity, graph serializability and transmission, and easier scheduling and optimization.

Library features include: graph definition via a functional API inspired by Keras and PyTorch, over-the-network execution of graphs that span across multiple devices, API for Android (Kotlin/Java) edge clients and servers (Python), integration with Reactive Extensions (Rx), optimal scheduling for low latency and high throughput, asynchronous execution and multi-threading support, backpressure handling, and modules for network transmission of compressed feature tensor data.

Musical Speech: A Transformer-based Composition Tool

Authors: Jason d'Eon, Sri Harsha Dumpala, Chandramouli Shama Sastry, Daniel Oore, Mengyu Yang, Sageev Oore

In this demo we propose a compositional tool that generates musical sequences based on prosody of speech recorded by the user. The tool allows any user–-regardless of musical training--to use their own speech to generate musical melodies, while hearing the direct connection between their recorded speech and resulting music. This is achieved with a pipeline combining speech-based signal processing [1,2], musical heuristics, and a set of transformer models [3,4] trained for new musical tasks. Importantly, the pipeline is designed to work with any kind of speech input and does not require a paired dataset for the training of the said transformer model.

Our approach consists of the following steps:

  1. Estimate the F0 values and loudness envelope of the speech signal.
  2. Convert this into a sequence of musical constraints derived from the speech signal.
  3. Apply one or more transformer models—each trained on different musical tasks or datasets—to this constraint sequence to produce musical sequences that follow or accompany the speech patterns in a variety of ways.

The demo is self-explanatory: the audience can interact with the system by either providing a live-recording using a web-based recording interface or by uploading a pre-recorded speech sample. The system then provides a visualization of the formant contours extracted from the provided speech sample, the set of note constraints obtained from the speech, and the sequence of musical notes as generated by the transformers. The audience can also listen to—and interactively mix the levels (volume) of—the input speech sample, initial note sequences, and the musical sequences as generated by the transformer models.

[1] Rabiner & Huang. Fundamentals of speech recognition.
[2] Dumpala et al. Sine-wave speech as pre-processing for downstream tasks. Symp. FRSM 2020
[3] Vaswani et al. Attention is all you need. NeurIPS 2017
[4] Huang et al, Music Transformer ICLR 2018

xLP: Explainable Link Prediction Demo

Authors: Balaji Ganesan, Matheen Ahmed Pasha, Srinivasa Parkala, Neeraj R Singh, Gayatri Mishra, Sumit Bhatia, Hima Patel, Somashekar Naganna, Sameep Mehta

Explaining neural model predictions to users requires creativity. Especially in enterprise applications, where there are costs associated with users' time, and their trust in the model predictions is critical for adoption. For link prediction in master data management, we have built a number of explainability solutions drawing from research in interpretability, fact verification, path ranking, neuro-symbolic reasoning and self-explaining AI. In this demo, we present explanations for link prediction in a creative way, to allow users to choose explanations they are more comfortable with.

Coreference Resolution for Neutralizing Gendered Pronouns 

Author: Parth Raghav

Gender Neutralization is an important task in text anonymization and generatively producing gender-free descriptions of people and objects. We demonstrate a web tool that utilizes Coreference Resolution and a heuristic to neutralize long gendered texts.

Project Website:

Program committee

We are very grateful with the colleagues that helped us to review and select demonstration proposals:

Arijit    Mukherjee    TCS Innovation Labs
Chin-Yi    Cheng    Autodesk Research
Daniel    Toyama    DeepMind
Denis    Gudovskiy    Panasonic
Dhruv    Karthik    University of Pennsylvania
Dong-Ok    Won    Korea University
Hejia    Zhang    University of Southern California
Hendrik    Strobelt    MIT-IBM Watson AI Lab
Hugo Jair     Escalante    INAOE
Ivan    Bajic    Simon Fraser University
Jens    Tuyls    UC Irvine
Joachim    Giesen    Friedrich Schiller University Jena
Jun    Gao    "University of Toronto, Nvidia"
Katja    Hofmann    Microsoft Research
Luca    Rigazio    Panasonic
Mateen    Ulhaq    Simon Fraser University
Matthew    O'Kelly    University of Pennsylvania
Sameer     Singh    "University of California, Irvine"
Shaobo    Hou    DeepMind
Shibl    Mourad    DeepMind
Shikai    Luo    Didi Chuxing
Sören    Laue    Friedrich Schiller University Jena / Data Assessment Solutions GmbH Hannover
Vincent    Herrmann    University of Music Karlsruhe
Wei    Zhang    IBM Research
Zain    Shah    Vidi Labs