`

Timezone: »

 
Workshop
Deep Generative Models and Downstream Applications
José Miguel Hernández-Lobato · Yingzhen Li · Yichuan Zhang · Cheng Zhang · Austin Tripp · Weiwei Pan · Oren Rippel

Tue Dec 14 06:00 AM -- 03:00 PM (PST) @ None
Event URL: https://dgms-and-applications.github.io/2021/ »

Deep generative models (DGMs) have become an important research branch in deep learning, including a broad family of methods such as variational autoencoders, generative adversarial networks, normalizing flows, energy based models and autoregressive models. Many of these methods have been shown to achieve state-of-the-art results in the generation of synthetic data of different types such as text, speech, images, music, molecules, etc. However, besides just generating synthetic data, DGMs are of particular relevance in many practical downstream applications. A few examples are imputation and acquisition of missing data, anomaly detection, data denoising, compressed sensing, data compression, image super-resolution, molecule optimization, interpretation of machine learning methods, identifying causal structures in data, generation of molecular structures, etc. However, at present, there seems to be a disconnection between researchers working on new DGM-based methods and researchers applying such methods to practical problems (like the ones mentioned above). This workshop aims to fill in this gap by bringing the two aforementioned communities together.

Tue 6:00 a.m. - 6:10 a.m.
Opening remarks (Presentation)
Tue 6:10 a.m. - 6:25 a.m.
Invited talk #1: Aapo Hyvärinen (Presentation)   
Aapo Hyvarinen
Tue 6:25 a.m. - 6:30 a.m.
Q&A Invited Talk #1 (Q&A)
Tue 6:30 a.m. - 6:45 a.m.
Invited talk #2: Finale Doshi-Velez (Presentation)
Finale Doshi-Velez
Tue 6:45 a.m. - 6:50 a.m.
Q&A Invited Talk #2 (Q&A)
Tue 6:50 a.m. - 7:05 a.m.
Invited Talk #3: Rianne van den Berg (Presentation)   
Rianne van den Berg
Tue 7:05 a.m. - 7:10 a.m.
Q&A Invited Talk #3 (Q&A)
Tue 7:10 a.m. - 7:20 a.m.
[ OpenReview  link »   

Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the ``contrastive approaches'' is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such sampling paradigms run MCMC targeting the current model, which requires infinitely long chains to generate samples from the true energy distribution and is problematic in practice. In this paper, we propose an alternative approach to getting these samples avoiding crude MCMC sampling from the current model. We accomplish this by viewing the evolution of the modeling distribution as (i) the evolution of the energy function, and (ii) the evolution of the samples from this distribution along some vector field. We subsequently derive this time-dependent vector field such that the particles following this field are approximately distributed as the current model thereby matching the evolution of the particles with the evolution of the energy function prescribed by the learning procedure. Importantly, unlike Monte Carlo sampling, our method targets to match the current distribution in a finite time. Finally, we demonstrate its effectiveness empirically comparing to MCMC-based learning methods.

Tue 7:20 a.m. - 7:30 a.m.
[ OpenReview  link »   

Diffusion Probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, Variational Autoencoders (VAEs) have access to a low-dimensional latent space but exhibit poor sample quality. Despite recent advances, VAEs usually require large dimensional hierarchies of the latent codes to generate high quality samples. We present VAEDM, a novel generative framework for refining VAE generated samples using diffusion models while also presenting a novel conditional forward process parameterization for diffusion models. We show that the resulting parameterization can improve upon the unconditional diffusion model in terms of sampling efficiency during inference while also equipping diffusion models with the low-dimensional VAE inferred latent code. Furthermore, we show that the proposed model exhibits out-of-the-box capabilities for downstream tasks like image superresolution and denoising.

Tue 7:30 a.m. - 7:35 a.m.
Contributed poster talk #1-2 Q&A (Q&A)
Tue 7:35 a.m. - 8:00 a.m.
Break #1 (Break)
Tue 8:00 a.m. - 8:15 a.m.
Invited talk #4: Chris Williams (Presentation)   
Chris Williams
Tue 8:15 a.m. - 8:20 a.m.
Q&A Invited Talk #4 (Q&A)
Tue 8:20 a.m. - 8:35 a.m.
Invited talk #5: Mihaela van der Shaar (Presentation)
Mihaela van der Schaar
Tue 8:35 a.m. - 8:40 a.m.
Q&A Invited Talk #5 (Q&A)
Tue 8:40 a.m. - 8:55 a.m.
Invited Talk #6: Luisa Zintgraf (Presentation)   
Luisa Zintgraf
Tue 8:55 a.m. - 9:00 a.m.
Q&A Invited Talk #6 (Q&A)
Tue 9:00 a.m. - 9:10 a.m.
[ OpenReview  link »   
Neural Compressors (NCs) are codecs that leverage neural networks and entropy coding to achieve competitive compression performance for images, audio, and other data types. These compressors exploit parallel hardware, and are particularly well suited to compressing i.i.d. batches of data. The average number of bits needed to represent each example is at least the well-known cross-entropy. However, the cross-entropy bound assumes the order of the compressed examples in a batch is preserved, which in many applications is not necessary. The amount of bits used to implicitly store the order information is the logarithm of the number of unique permutations of the dataset. In this work, we present a method that allows any codec to compress below the cross-entropy by exactly the number of bits needed to store the order, at the expense of shuffling the dataset in the process. Conceptually, our method applies bits-back coding to a latent variable model with observed symbol counts (i.e. multiset) and a latent permutation defining the ordering, and does not require retraining any models. Experiments with lossless NCs and lossy off-the-shelf codecs such as WebP achieve savings of up to $7.6\%$ on Binarized MNIST, while adding only $10\%$ extra compute time.
Tue 9:10 a.m. - 9:20 a.m.
[ OpenReview  link »

Controllable audio synthesis is a core element of creative sound design. Recent advancements in AI have made high-fidelity neural audio synthesis achievable. However, the high temporal resolution of audio and our perceptual sensitivity to small irregularities in waveforms make synthesizing at high sampling rates a complex and computationally intensive task, prohibiting real-time, controllable synthesis within many approaches. In this work we aim to shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in generative frameworks for audio synthesis.Implicit neural representations (INRs) are neural networks used to approximate low-dimensional functions, trained to represent a single geometric object by mapping input coordinates to structural information at input locations. In contrast with other neural methods for representing geometric objects, the memory required to parameterize the object is independent of resolution, and only scales with its complexity. A corollary of this is that INRs have infinite resolution, as they can be sampled at arbitrary resolutions. To apply the concept of INRs in the generative domain we frame generative modelling as learning a distribution of continuous functions. This can be achieved by introducing conditioning methods to INRs.Our experiments show that Periodic Conditional INRs (PCINRs) learn faster and generally produce quantitatively better audio reconstructions than Transposed Convolutional Neural Networks with equal parameter counts. However, their performance is very sensitive to activation scaling hyperparameters. When learning to represent more uniform sets, PCINRs tend to introduce artificial high-frequency components in reconstructions. We validate this noise can be minimized by applying standard weight regularization during training or decreasing the compositional depth of PCINRs, and suggest directions for future research.

Tue 9:20 a.m. - 9:25 a.m.
Contributed poster talk #3-4 Q&A (Q&A)
Tue 9:25 a.m. - 10:00 a.m.
Break #2 (Break)
Tue 10:00 a.m. - 11:00 a.m.
Poster session #1 (poster session (gathertown))
Tue 11:00 a.m. - 11:30 a.m.
Panel Discussion (Discussion Panel)
Tue 11:30 a.m. - 11:45 a.m.
Invited Talk #7: Romain Lopez (Presentation)   
Romain Lopez
Tue 11:45 a.m. - 11:50 a.m.
Q&A Invited Talk #7 (Q&A)
Tue 11:50 a.m. - 12:10 p.m.
Break #3 (Break)
Tue 12:10 p.m. - 12:25 p.m.
Invited talk #8: Alex Anderson (Presentation)
Alexander Anderson
Tue 12:25 p.m. - 12:30 p.m.
Q&A Invited Talk #8 (Q&A)
Tue 12:30 p.m. - 12:40 p.m.
[ OpenReview  link »   

Generative adversarial networks (GANs) are notably difficult to train since the parameters can get stuck in a local optimum. As a result, methods often suffer not only from degeneration of the convergence speed but also from limitations in the representational power of the trained network. Existing optimization methods to stabilize convergence require multiple gradient computations per iteration. We propose AGE, an alternating extra-gradient method with nonlinear gradient extrapolation, that overcomes these computational inefficiencies and exhibits better convergence properties. It estimates the lookahead step using a nonlinear mixing of past gradient sequences. Empirical results on CIFAR10, CelebA, and several synthetic datasets demonstrate that the introduced approach significantly improves convergence and yields better generative models.

Tue 12:40 p.m. - 12:50 p.m.
[ OpenReview  link »   
Photo-acid generators (PAGs) are compounds that release acids ($H^+$ ions) when exposed to light. These compounds are critical components of the photolithography processes that are used in the manufacture of semiconductor logic and memory chips. The exponential increase in the demand for semiconductors has highlighted the need for discovering novel photo-acid generators. While de novo molecule design using deep generative models has been widely employed for drug discovery and material design, its application to the creation of novel photo-acid generators poses several unique challenges, such as lack of property labels. In this paper, we highlight these challenges and propose a generative modeling approach that utilizes conditional generation from a pre-trained deep autoencoder and expert-in-the-loop techniques. The validity of the proposed approach was evaluated with the help of subject matter experts, indicating the promise of such an approach for applications beyond the creation of novel photo-acid generators.
Tue 12:50 p.m. - 12:55 p.m.
Contributed poster talk #5-6 Q&A (Q&A)
Tue 12:55 p.m. - 1:10 p.m.
Invited talk #9: Zhifeng Kong (Presentation)   
Zhifeng Kong
Tue 1:10 p.m. - 1:15 p.m.
Q&A Invited Talk #9 (Q&A)
Tue 1:15 p.m. - 1:30 p.m.
Invited talk #10: Johannes Ballé (Presentation)   
Johannes Ballé
Tue 1:30 p.m. - 1:35 p.m.
Q&A Invited Talk #10 (Q&A)
Tue 1:35 p.m. - 1:45 p.m.
[ OpenReview  link »   

Machine learning models are commonly trained end-to-end and in a supervised setting, using paired (input, output) data. Examples include recent super-resolution methods that train on pairs of (low-resolution, high-resolution) images. However, these end-to-end approaches require re-training every time there is a distribution shift in the inputs (e.g., night images vs daylight) or relevant latent variables (e.g., camera blur or hand motion). In this work, we leverage state-of-the-art (SOTA) generative models (here StyleGAN2) for building powerful image priors, which enable application of Bayes' theorem for many downstream reconstruction tasks. Our method, "Bayesian Reconstruction through Generative Models" (BRGM), uses a single pre-trained generator model to solve different image restoration tasks, i.e., super-resolution and in-painting, by combining it with different forward corruption models. We keep the weights of the generator model fixed, and reconstruct the image by estimating the Bayesian maximum a-posteriori (MAP) estimate over the input latent vector that generated the reconstructed image. We further use Variational Inference to approximate the posterior distribution over the latent vectors, from which we sample multiple solutions. We demonstrate BRGM on three large and diverse datasets: (i) 60,000 images from the Flick Faces High Quality dataset (ii) 240,000 chest X-rays from MIMIC III and (iii) a combined collection of 5 brain MRI datasets with 7,329 scans. Across all three datasets and without any dataset-specific hyperparameter tuning, our simple approach yields performance competitive with current task-specific state-of-the-art methods on super-resolution and in-painting, while being more generalisable and without requiring any training. Our source code and pre-trained models are available online.

Tue 1:45 p.m. - 1:55 p.m.
[ OpenReview  link »   

In this work we address the problem of Knowledge Graph (KG) construction from text, proposing a novel end-to-end multi-stage Grapher system, that separates the overall generation process into two stages. The graph nodes are generated first using pretrained language model, followed by a simple edge construction head, enabling efficient KG extraction from the textual descriptions. For each stage we proposed several architectural choices that can be used depending on the available training resources. We evaluated the Grapher on a recent WebNLG 2020 Challenge dataset, achieving new state-of-the-art results on text-to-RDF generation task, as well as on a recent large-scale TEKGEN dataset, showing strong overall performance. We believe that the proposed Grapher system can serve as a viable KG construction alternative to the existing linearization or sampling-based graph generation approaches.

Tue 1:55 p.m. - 2:00 p.m.
Contributed poster talk #7-8 Q&A (Q&A)
Tue 2:00 p.m. - 3:00 p.m.
Poster session #2 (poster session (gathertown))
-
[ OpenReview  link »

Liquid state estimation is important for robotics tasks such as pouring; however, estimating the state of transparent liquids is a challenging problem. We propose a novel segmentation pipeline that can segment transparent liquids such as water from a static, RGB image without requiring any manual annotations or heating of the liquid for training. Instead, we use a generative model that is capable of translating unpaired images of colored liquids into synthetically generated transparent liquid images. Segmentation labels of colored liquids are obtained automatically using background subtraction. We use paired samples of synthetically generated transparent liquid images and background subtraction for our segmentation pipeline. Our experiments show that we are able to accurately predict a segmentation mask for transparent liquids without requiring any manual annotations. We demonstrate the utility of transparent liquid segmentation in a robotic pouring task that controls pouring by perceiving liquid height in a transparent cup. Accompanying video and supplementary information can be found at https://sites.google.com/view/roboticliquidpouring

-
[ OpenReview  link »

Black-box optimization problems are ubiquitous and of importance in many critical areas of science and engineering. Bayesian optimisation (BO) over the past years has emerged as one of the most successful techniques for optimising expensive black-box objectives. However, efficient scaling of BO to high-dimensional settings has proven to be extremely challenging. Traditional strategies based on projecting high-dimensional input data to a lower-dimensional manifold, such as Variational autoencoders (VAE) and Generative adversarial networks (GAN) have improved BO performance in high-dimensional regimes, but their dependence on excessive labeled input data has been widely reported. In this work, we target the data-greedy nature of deep generative models by constructing uncertainty-aware task-specific labeled data augmentations using Gaussian processes (GPs). Our approach outperforms existing state-of-the-art methods on machine learning tasks and demonstrates more informative data representation with limited supervision.

-
[ OpenReview  link »

Constructing novel molecules from scratch using deep generative models provides useful alternative to traditional virtual screening methods which are limited to the search of the already discovered chemicals. In particular, molecular optimisation combined with sampling guided by reinforcement learning seems like a promising path for discovering novel molecular designs and allows for domain-specific customization of the desired solutions. The choice of a chemically relevant reward function and the exhaustive assessment of its properties remains a challenging task. We introduce the reward function which gives enough flexibility to quantify the biological activity with respect to a selected protein target, drug-likeness, synthesizability and incorporates the custom index of penalised physico-chemical properties. In order to customise the hyper-parameters influencing the RL agent performance, wepropose the methodology which helps to quantify the chemical relevance of the reward function by quantifying the chemical relevance of the samples. We assess the performance of the reward function by docking the molecules with relevant protein targets and quantify the difference with the ground truth samples using Wasserstein distance.

-
[ OpenReview  link »

The physical processes of stars are encoded in their periodic pulsations. Millions of variable stars will be observed by the upcoming Vera Rubin Observatory. Here, we present a convolutional autoencoder-based pipeline as an automatic approach to search for anomalous periodic variables within the Zwicky Transient Facility catalog of periodic variable stars. We encode their light curves using the convolutional autoencoder, and we use an isolation forest to score each periodic variable star in the latent space. Our overall most anomalous events share some similarities: they are mostly cool, red, high variability, and irregularly oscillating periodic variables. Observational data suggested that they are most likely young and massive Red Giant or Asymptotic Giant Branch stars. Furthermore, we use the learned latent feature for the classification of periodic variables through a hierarchical random forest. This novel semi-supervised approach allows astronomers to identify the most anomalous events within a given physical class, accelerating the potential for scientific discovery.

-
[ OpenReview  link »

Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations.

-
[ OpenReview  link »

Periodic signals play an important role in daily lives. Although conventional sequential models have shown remarkable success in various fields, they still come short in modeling periodicity; they either collapse, diverge or ignore details. In this paper, we introduce a novel framework inspired by Fourier series to generate periodic signals. We first decompose the given signals into multiple sines and cosines and then conditionally generate periodic signals with the output components. We have shown our model efficacy on three tasks: reconstruction, imputation and conditional generation. Our model outperforms baselines in all tasks and shows more stable and refined results.

-
[ OpenReview  link »

We introduce Palette, a simple unified framework for image-to-image translation using conditional diffusion models.We demonstrate the effectiveness of Palette on four distinct image-to-image translation tasks, namely: colorization, inpainting, uncropping, and JPEG decompression. We propose a unified evaluation protocol across these tasks on the ImageNet dataset and report several perceptual evaluation metrics including FID, Inception Score, Classification Accuracy of a pre-trained classifier, and Perceptual Distance with the reference image. We outperform existing GAN and autoregressive models on colorization, demonstrating the versatility and the general applicability of diffusion models to image-to-image tasks without the need for any task-specific loss modification, tuning, or architecture modification.

-
[ OpenReview  link »

We consider the multiple-instance learning (MIL) paradigm, which is a special case of supervised learning where training instances are grouped into bags. In MIL, the hidden instance labels do not have to be the same as the label of the comprising bag. On the other hand, the hybrid modelling approach is known to possess advantages basically due to the smooth consolidation of both discriminative and generative components. In this paper, we investigate whether we can get the best of both worlds (MIL and hybrid modelling), especially in a semi-supervised learning (SSL) setting. We first integrate a variational autoencoder (VAE), which is a powerful deep generative model, with an attention-based MIL classifier, then evaluate the performance of the resulting model in SSL. We assess the proposed approach on an established benchmark as well as a real-world medical dataset.

-
[ OpenReview  link »

We address the task of learning generative models of human gait. As gait motion always follows the physical laws, a generative model should also produce outputs that comply with the physical laws, particularly rigid body dynamics with contact and friction. We propose a deep generative model combined with a differentiable physics engine, which outputs physically plausible signals by construction. The proposed model is also equipped with a policy network conditioned on each sample. We show an example of the application of such a model to style transfer of gait.

-
[ OpenReview  link »

Designing new industrial materials with desired properties can be very expensive and time consuming. The main difficulty is to generate compounds that correspond to realistic materials. Indeed, description of the compounds as vectors of components' proportions is characterized by a severe sparsity. Furthermore, traditional generative model validation processes as visual verification, FID and Inception scores cannot be used in this context. To tackle these issues, we develop an original Binded-VAE model tailored to generate sharp datasets with high sparsity. We validate the model with novel metrics adapted to the problem of compounds generation. We show on a real issue of rubber compound design that the proposed approach outperforms the standard generative models which opens new perspectives for material design optimization.

-
[ OpenReview  link »

We introduce an approach for training Variational Autoencoders (VAEs) that are certifiably robust to adversarial attack. Specifically, we first derive actionable bounds on the minimal size of an input perturbation required to change a VAE's reconstruction by more than an allowed amount, with these bounds depending on certain key parameters such as the Lipschitz constants of the encoder and decoder. We then show how these parameters can be controlled, thereby providing a mechanism to ensure \textit{a priori} that a VAE will attain a desired level of robustness. Moreover, we extend this to a complete practical approach for training such VAEs to ensure our criteria are met. Critically, our method allows one to specify a desired level of robustness \emph{upfront} and then train a VAE that is guaranteed to achieve this robustness. We further demonstrate that these \emph{Lipschitz--constrained} VAEs are more robust to attack than standard VAEs in practice.

-
[ OpenReview  link »

Generative Adversarial Networks (GANs) is a powerful family of models that learn an underlying distribution to generate synthetic data. Many existing studies of GANs focus on improving the realness of the generated image data for visual applications, and few of them concern about improving the quality of the generated data for training other classifiers---a task known as the model compatibility problem. As a consequence, existing GANs often prefer generating `easier' synthetic data that are far from the boundaries of the classifiers, and refrain from generating near-boundary data, which are known to play an important roles in training the classifiers. To improve GAN in terms of model compatibility, we propose Boundary-Calibration GANs (BCGANs), which leverage the boundary information from a set of pre-trained classifiers using the original data. In particular, we introduce an auxiliary Boundary-Calibration loss (BC-loss) into the generator of GAN to match the statistics between the posterior distributions of original data and generated data with respect to the boundaries of the pre-trained classifiers. The BC-loss is provably unbiased and can be easily coupled with different GAN variants to improve their model compatibility. Experimental results demonstrate that BCGANs not only generate realistic images like original GANs but also achieves superior model compatibility than the original GANs.

-
[ OpenReview  link »

In design of instance segmentation networks that reconstruct masks, segmentation is often taken as its literal definition -- assigning each pixel a label. This has led to thinking the problem as a template matching one with the goal of minimizing the loss between the reconstructed and the ground truth pixels. Rethinking reconstruction networks as a generator, we define the problem of predicting masks as a GANs game framework: A segmentation network generates the masks, and a discriminator network decides on the quality of the masks. To demonstrate this game, we show effective modifications on the general segmentation framework in Mask R-CNN. We find that playing the game in feature space is more effective than the pixel space leading to stable training between the discriminator and the generator, predicting object coordinates should be replaced by predicting contextual regions for objects, and overall the adversarial loss helps the performance and removes the need for any custom settings per different data domain. We test our framework in various domains and report on cellphone recycling, autonomous driving, large-scale object detection, and medical glands. We observe in general GANs yield masks that account for crispier boundaries, clutter, small objects, and details, being in domain of regular shapes or heterogeneous and coalescing shapes. Our code for reproducing the results is available publicly.

-
[ OpenReview  link »

Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. This method combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classifier separate from the diffusion model. We show that guidance can be performed by a pure generative model without such a classifier: we jointly train a conditional and an unconditional diffusion model, and find that it is possible to combine the resulting conditional and unconditional scores to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance.

-
[ OpenReview  link »

Predict missing values in tabular data with uncertainty is essential task by itself as well as for downstream applications such as personalized decision making. But it is not clear whether state-of-the-art deep generative models for this are well equipped to model the complex relationships that may exist between different features, especially when the subset of observed data are treated as a set. In this work we propose new attention-based models for estimating the joint conditional distribution of randomly missing values in mixed-type tabular data. The models improve on the state-of-the-art Partial Variational Autoencoder (ma et.al. 2019) on a range of imputation and personalized information acquisition

-
[ OpenReview  link »

Hierarchical forecasting problems arise when time series compose a group structure that naturally defines aggregation and disaggregation coherence constraints for the predictions. In this work, we explore a new forecast representation, the Poisson Mixture Mesh (PMM), that can produce probabilistic, coherent predictions; it is compatible with the neural forecasting innovations, and defines simple aggregation and disaggregation rules capable of accommodating hierarchical structures, unknown during its optimization. We perform an empirical evaluation to compare the PMM to other methods on Australian domestic tourism data.

-
[ OpenReview  link »

In this paper, we propose a Generative model for multivariate time-series anomaly detection, which generates windows from a novel hierarchical latent factor representation (DGHL). These novel hierarchical factors exploit time-series dynamics to encode information efficiently. DGHL does not rely on an encoder network; instead, it infers and samples latent vectors from the posterior distribution directly with Langevin Dynamics. Despite relying on posterior sampling, our approach is computationally more efficient than RNN based models, with up to 10x shorter training times. DGHL achieved state-of-the-art performance on four popular benchmark datasets, outperforming current reconstruction-based and generative models. With IoT, settings with corrupted or missing data have increasing relevance. The proposed model has superior robustness to incomplete data, which we demonstrate with novel occlusion experiments in this literature.

-
[ OpenReview  link »

Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold.

-
[ OpenReview  link »

Recently, there has been a renewed interest in returning to the Moon, with many1planned missions targeting the south pole. This region is of high scientific and commercial interest, mostly due to the presence of water-ice and other volatiles which could enable our sustainable presence on the Moon and beyond. In order to plan safe and effective crewed and robotic missions, access to high-resolution (<0.5 m) surface imagery is critical. However, the overwhelming majority (99.7%) of existing images over the south pole have spatial resolutions >1 m. In order to obtain better images, the only currently available way is to launch a new satellite mission to the Moon with better equipment to gather more precise data. In this work we develop an alternative that can be used directly on previously gathered data and therefore saving a lot of resources. It consist of a single image super-resolution (SR) approach based on generative adversarial networks that is able to super-resolve existing images from 1 m to 0.5 m resolution, unlocking a large catalogue of images (∼50,000) for a more accurate mission planning in the region of interest for the upcoming missions. We show that our enhanced images reveal previously unseen hazards such as small craters and boulders, allowing safer traverse planning. Our approach also includes uncertainty estimation, which allows mission planners to understand the reliability of the super-resolved images.

-
[ OpenReview  link »

Many self-supervised methods have been proposed with the target of image anomaly detection. These methods often rely on the paradigm of data augmentation with predefined transformations such as flipping, cropping, and rotations. However, it is not straightforward to apply these techniques for non-image data, such as time series or tabular data, while the performance of the existing deep approaches has been under our expectation on tasks beyond images. In this work, we propose a novel active learning (AL) scheme that relied on neural autoregressive flows (NAF) for self-supervised anomaly detection, specifically on small-scale data. Unlike other generative models such as GANs or VAEs, flow-based models allow to explicitly learn the probability density and thus can assign accurate likelihoods to normal data which makes it usable to detect anomalies. The proposed NAF-AL method is achieved by efficiently generating random samples from latent space and transforming them into feature space along with likelihoods via invertible mapping. The samples with lower likelihoods are selected and further checked by outlier detection using Mahalanobis distance. The augmented samples incorporating with normal samples are used for training a better detector so as to approach decision boundaries. Compared with random transformations, NAF-AL can be interpreted as a likelihood-oriented data augmentation that is more efficient and robust. Extensive experiments show that our approach outperforms existing baselines on multiple time series and tabular datasets, and a real-world application in advanced manufacturing, with significant improvement on anomaly detection accuracy and robustness over the state-of-the-art.

-
[ OpenReview  link »

In content-based image retrieval (CBIR) at category level, the relevance of a retrieved image is evaluated with respect to a category of interest. In the best-case scenario, there is an available labeled dataset that can be used to train models that create the necessary data representations for CBIR. However, this is not always the case. In this paper, we explore the use of disentangled representation learned via weak supervision by using data organized into groups with shared content information. We show that such models attain competitive retrieval performances compared to unsupervised models. Moreover, since disentangled representations separate the explanatory factors in datasets, the learned representations can be used also for CBIR with respect to other categories that might be of interest but for which there is little supervision available.

-
[ OpenReview  link »

In anomaly detection (AD), one seeks to identify whether a test sample is abnormal, given a data set of normal samples. A recent and promising approach to AD relies on deep generative models, such as variational autoencoders (VAEs), for unsupervised learning of the normal data distribution. In semi-supervised AD (SSAD), the data also includes a small sample of labeled anomalies. In this work, we propose two variational methods for training VAEs for SSAD. The intuitive idea in both methods is to train the encoder to `separate' between latent vectors for normal and outlier data. We show that this idea can be derived from principled probabilistic formulations of the problem, and propose simple and effective algorithms. Our methods can be applied to various data types, as we demonstrate on SSAD datasets ranging from natural images to astronomy and medicine, can be combined with any VAE model architecture, and are naturally compatible with ensembling. When comparing to state-of-the-art SSAD methods that are not specific to particular data types, we obtain marked improvement in outlier detection.

-
[ OpenReview  link »

The scarcity of network traffic datasets has become a major impediment to recent traffic analysis research. Data collection is often hampered by privacy concerns, leaving researchers with no choice but to capture limited amounts of highly unbalanced network traffic. Furthermore, traffic classes, particularly network attacks, represent the minority making many techniques such as Deep Learning prone to failure. We address this issue by proposing a Generative Adversarial Network for balancing minority classes and generating highly customizable attack traffic. The framework regulates the generation process with conditional input vectors by creating flows that inherit similar characteristics from the original classes while preserving the flexibility to change their properties. We validate the generated samples with four tests. Our results show that the artificially augmented data is indeed similar to the original set and that the customization mechanism aids in the generation of personalized attack samples while remaining close to the original feature distribution.

-
[ OpenReview  link »

We study the problem of learning data representations that are private yet informative, i.e., providing information about intended ally'' targets while obfuscating sensitiveadversary'' attributes. We propose a novel framework, Exclusion-Inclusion Generative Adversarial Network (EIGAN), that generalizes adversarial private representation learning (PRL) approaches to generate data encodings that account for multiple (possibly overlapping) ally and adversary targets. Preserving privacy is even more difficult when the data is collected across multiple distributed nodes, which for privacy reasons may not wish to share their data even for PRL training. Thus, learning such data representations at each node in a distributed manner (i.e., without transmitting source data) is of particular importance. This motivates us to develop D-EIGAN, the first distributed PRL method, based on fractional parameter sharing that promotes differentially private parameter sharing and also accounts for communication resource limitations. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and consider the impact of dependencies among ally and adversary tasks on the encoder performance. Our experiments on real-world and synthetic datasets demonstrate the advantages of EIGAN encodings in terms of accuracy, robustness, and scalability; in particular, we show that EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement). The experiments further reveal that D-EIGAN's performance is consistent with EIGAN under different node data distributions and is resilient to communication constraints.

-
[ OpenReview  link »

The tremendous success of generative models in recent years raises the question of whether they can also be used to perform classification. Generative models have been used as adversarially robust classifiers on simple datasets such as MNIST, but this robustness has not been observed on more complex datasets like CIFAR-10. Additionally, on natural image datasets, previous results have suggested a trade-off between the likelihood of the data and classification accuracy. In this work, we investigate score-based generative models as classifiers for natural images. We show that these models not only obtain competitive likelihood values but simultaneously achieve state-of-the-art classification accuracy for generative classifiers on CIFAR-10. Nevertheless, we find that these models are only slightly, if at all, more robust than discriminative baseline models on out-of-distribution tasks based on common image corruptions. Similarly and contrary to prior results, we find that score-based are prone to worst-case distribution shifts in the form of adversarial perturbations. Our work highlights that score-based generative models are closing the gap in classification accuracy compared to standard discriminative models. While they do not yet deliver on the promise of adversarial and out-of-domain robustness, they provide a different approach to classification that warrants further research.

-
[ OpenReview  link »
The recently proposed genetic expert guided learning (GEGL) framework has demonstrated impressive performances on several de novo molecular design tasks. Despite the displayed state-of-the art results, the proposed system relies on an expert-designed Genetic expert. Although hand-crafted experts allow to navigate the chemical space efficiently, designing such experts requires a significant amount of effort and might contain inherent biases which can potentially slow down convergence or even lead to sub-optimal solutions. In this research, we propose a novel genetic expert named InFrag which is free of design rules and can generate new molecules by combining promising molecular fragments. Fragments are obtained by using an additional graph convolutional neural network which computes attributions for each atom for a given molecule. Molecular substructures which contribute positively to the task score are kept and combined to propose novel molecules. We experimentally demonstrate that, within the GEGL framework, our proposed attribution-based genetic expert is either competitive or outperforms the original expert-designed genetic expert on goal-directed optimization tasks. When limiting the number of optimization rounds to one and three rounds, a performance increase of approximately $ 43\%$ and $20\%$ respectively is observed compared to the baseline genetic expert.
-
[ OpenReview  link »

In this paper, we propose Normality-Calibrated Autoencoder (NCAE), which can boost anomaly detection performance on the contaminated datasets without any prior information or explicit abnormal samples in the training phase. The NCAE adversarially generates high confident normal samples from a latent space having low entropy and leverages them to predict abnormal samples in a training dataset. NCAE is trained to minimise reconstruction errors in uncontaminated samples and maximise reconstruction errors in contaminated samples. The experimental results demonstrate that our method outperforms shallow, hybrid, and deep methods for unsupervised anomaly detection and achieves comparable performance compared with semi-supervised methods using labelled anomaly samples in the training phase. The source code is publicly available on `https://github.com/nonamescientist/NCAE_UAD.git'.

-
[ OpenReview  link »

Variational autoencoders trained to minimize the reconstruction error are sensitive to the posterior collapse problem, that is the proposal posterior distribution is always equal to the prior. We propose a novel regularization method based on fraternal dropout to prevent posterior collapse. We evaluate our approach using several metrics and observe improvements in all the tested configurations.

-
[ OpenReview  link »

Deep generative models are becoming widely used across science and industry for a variety of purposes. A common challenge is achieving a precise implicit or explicit representation of the data probability density. Recent proposals have suggested using classifier weights to refine the learned density of deep generative models. We extend this idea to all types of generative models and show how latent space refinement via iterated generative modeling can circumvent topological obstructions and improve precision. This methodology also applies to cases were the target model is non-differentiable and has many internal latent dimensions which must be marginalized over before refinement. We demonstrate our Latent Space Refinement (LaSeR) protocol on a variety of examples, focusing on the combinations of Normalizing Flows and Generative Adversarial Networks.

-
[ OpenReview  link »

Predicting future states is a challenging process in the decision-making system because of its inherently uncertain nature. Most works in this literature are based on deep generative networks such as variational autoencoder which uses pixel-wise reconstruction in their loss functions. Predicting the future with pixel-wise reconstruction could fail to capture the full distribution of high-level representations and results in inaccurate and blurred predictions. In this paper, we propose stochastic video generation with perceptual loss (SVG-PL) to improve uncertainty and blurred are in future prediction. The proposed model combines perceptual loss function and pixel-wise loss function for image reconstruction and future state predictions. The model is built on variational autoencoder to reduce high dimensionality to latent variable to capture both spatial information and temporal dynamics of future prediction. We show that utilization of perceptual loss on video prediction improves reconstruction ability and result in clear predictions. Improvements in video prediction could further help the decision-making process in multiple downstream applications.

-
[ OpenReview  link »

Recently, there has been an increasing interest in models that generate natural language explanations (NLEs) for their decisions. However, training a model to explain its decisions in natural language requires the acquisition of task-specific NLEs, which is time- and resource-consuming. A potential solution is the out-of-domain transfer of NLEs, where explainability is transferred from a domain with rich data to a domain with scarce data via few-shot transfer learning. In this work, we introduce and compare four approaches for few-shot transfer learning for NLEs. We transfer explainability from the natural language inference domain, where a large dataset of human-written NLEs already exists, to the domains of hard cases of pronoun resolution, and commonsense validation. Our results demonstrate that few-shot transfer far outperforms both zero-shot transfer and single-task training with few examples. We also investigate the scalability of the few-shot transfer of explanations, both in terms of training data and model size.

-
[ OpenReview  link »

Modeling and understanding spatiotemporal graphs have been a long-standing research topic in network science and typically replies on network processing hypothesized by human knowledge. In this paper, we aim at pushing forward the modeling and understanding of spatiotemporal graphs via new disentangled deep generative models. Specifically, a new Bayesian model is proposed that factorizes spatiotemporal graphs into spatial, temporal, and graph factors as well as the factors that explain the interplay among them. A variational objective function and new mutual information thresholding algorithms driven by information bottleneck theory have been proposed to maximize the disentanglement among the factors with theoretical guarantees. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the superiority of the proposed model over the state-of-the-art by up to 69.2\% for graph generation and 41.5\% for interpretability.

-
[ OpenReview  link »

Multi-label classification (MLC) is a prediction task where each sample can have more than one label. We propose a novel contrastive learning boosted multi-label prediction model based on a Gaussian mixture variational autoencoder (C-GMVAE), which learns a multimodal prior space and employs a contrastive loss. Many existing methods introduce extra complex neural modules to capture the label correlations, in addition to the prediction modules. We found that by using contrastive learning in the supervised setting, we can exploit label information effectively, and learn meaningful feature and label embeddings capturing both the label correlations and predictive power, without extra neural modules. Our method also adopts the idea of learning and aligning latent spaces for both features and labels. C-GMVAE imposes a Gaussian mixture structure on the latent space, to alleviate posterior collapse and over-regularization issues, in contrast to previous works based on a unimodal prior. C-GMVAE outperforms existing methods on multiple public datasets and can often match other models' full performance with only 50\% of the training data. Furthermore, we show that the learnt embeddings provide insights into the interpretation of label-label interactions.

Carla Gomes
-
[ OpenReview  link »

The application of deep learning in survival analysis (SA) gives the opportunity to utilize unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches.We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.

-
[ OpenReview  link »

We consider the problem of distilling an image into an ordered set of maximally informative patches, given prior data from the same domain. We cast this problem as one of maximizing a pointwise mutual information (PMI) objective between a subset of an image's patches and the perceptual content of the entire image. We take an image synthesis-based approach, reasoning that the patches that are most informative would also be most useful for predicting other pixel values. We capture this idea with an image completion CNN trained to model the PMI between an image's perceptual content and any of its subregions. Because our PMI objective is a submodular, monotonic function, we can greedily construct patch sets using the CNN to obtain a provably close approximation to the intractable optimal solution. We evaluate our approach on datasets of faces, common objects, and line drawings. For all datasets, we find that a surprisingly few number of patches are needed to reconstruct most images, demonstrating a particular type of redundancy of information in images, and new potentials in their sparse representations. We also show that these minimal patch sets may be used effectively for downstream tasks such as image classification.

Author Information

Jose Miguel Hernández-Lobato (University of Cambridge)
Yingzhen Li (Imperial College London)
Yichuan Zhang (Boltzbit Limited)
Cheng Zhang (Microsoft Research, Cambridge, UK)

Cheng Zhang is a principal researcher at Microsoft Research Cambridge, UK. She leads the Data Efficient Decision Making (Project Azua) team in Microsoft. Before joining Microsoft, she was with the statistical machine learning group of Disney Research Pittsburgh, located at Carnegie Mellon University. She received her Ph.D. from the KTH Royal Institute of Technology. She is interested in advancing machine learning methods, including variational inference, deep generative models, and sequential decision-making under uncertainty; and adapting machine learning to social impactful applications such as education and healthcare. She co-organized the Symposium on Advances in Approximate Bayesian Inference from 2017 to 2019.

Austin Tripp (University of Cambridge)
Weiwei Pan (Harvard University)
Oren Rippel (WaveOne, Inc.)

More from the Same Authors