In recent years, the growth of decision-making applications, where principled handling of uncertainty is of key concern, has led to increased interest in Bayesian techniques. By offering the capacity to assess and propagate uncertainty in a principled manner, Gaussian processes have become a key technique in areas such as Bayesian optimization, active learning, and probabilistic modeling of dynamical systems. In parallel, the need for uncertainty-aware modeling of quantities that vary over space and time has led to large-scale deployment of Gaussian processes, particularly in application areas such as epidemiology. In this workshop, we bring together researchers from different communities to share ideas and success stories. By showcasing key applied challenges, along with recent theoretical advances, we hope to foster connections and prompt fruitful discussion. We invite researchers to submit extended abstracts for contributed talks and posters.
Fri 7:00 a.m. - 7:30 a.m.
|
Introduction and Opening Remarks
(
Remarks
)
SlidesLive Video » |
🔗 |
Fri 7:30 a.m. - 8:00 a.m.
|
Invited Talk: Willie Neiswanger
(
Invited Talk
)
SlidesLive Video » |
Willie Neiswanger 🔗 |
Fri 8:00 a.m. - 8:30 a.m.
|
Invited Talk: Marta Blangiardo
(
Invited Talk
)
SlidesLive Video » |
Marta Blangiardo 🔗 |
Fri 8:30 a.m. - 9:00 a.m.
|
Coffee and Discussion
|
🔗 |
Fri 9:00 a.m. - 9:30 a.m.
|
Invited Talk: Viacheslav Borovitskiy
(
Invited Talk
)
SlidesLive Video » |
Viacheslav Borovitskiy 🔗 |
Fri 9:30 a.m. - 9:45 a.m.
|
Multi-Fidelity Experimental Design for Ice-Sheet Simulation
(
Contributed Talk
)
SlidesLive Video » |
Pierre Thodoroff 🔗 |
Fri 9:45 a.m. - 10:00 a.m.
|
Gaussian Processes at the Helm(holtz): A Better Way to Model Ocean Currents
(
Contributed Talk
)
SlidesLive Video » |
Renato Berlinghieri 🔗 |
Fri 10:00 a.m. - 11:00 a.m.
|
Lunch
|
🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Bayesian Spatial Clustered Regression for Count Value Data
(
Lightning Talk
)
SlidesLive Video » |
Guanyu Hu 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
Multi-Mean Gaussian Processes: A novel probabilistic framework for multi-correlated longitudinal data
(
Lightning Talk
)
SlidesLive Video » |
Arthur Leroy 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
Statistical Downscaling of Sea Surface Temperature Projections with a Multivariate Gaussian Process Model
(
Lightning Talk
)
SlidesLive Video » |
Ayesha Ekanayaka 🔗 |
Fri 11:09 a.m. - 11:12 a.m.
|
An Active Learning Reliability Method for Systems with Partially Defined Performance Functions
(
Lightning Talk
)
SlidesLive Video » |
Jonathan Sadeghi 🔗 |
Fri 11:12 a.m. - 11:15 a.m.
|
Spatiotemporal modeling of European paleoclimate using doubly sparse Gaussian processes
(
Lightning Talk
)
SlidesLive Video » |
Seth Axen 🔗 |
Fri 11:15 a.m. - 11:18 a.m.
|
Towards Improved Learning in Gaussian Processes: The Best of Two Worlds
(
Lightning Talk
)
SlidesLive Video » |
Ke Li 🔗 |
Fri 11:18 a.m. - 11:21 a.m.
|
Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning
(
Lightning Talk
)
SlidesLive Video » |
Zeel B Patel 🔗 |
Fri 11:21 a.m. - 11:24 a.m.
|
Challenges in Gaussian Processes for Non Intrusive Load Monitoring
(
Lightning Talk
)
SlidesLive Video » |
Aadesh Desai 🔗 |
Fri 11:24 a.m. - 11:27 a.m.
|
Preprocessing Data of Varying Trial Duration with Linear Time Warping to Extend on the Applicability of SNP-GPFA
(
Lightning Talk
)
SlidesLive Video » |
Arjan Dhesi 🔗 |
Fri 11:27 a.m. - 11:30 a.m.
|
Non-Gaussian Process Regression
(
Lightning Talk
)
SlidesLive Video » |
Yaman Kindap 🔗 |
Fri 11:30 a.m. - 12:00 p.m.
|
Invited Talk: Jasper Snoek
(
Invited Talk
)
SlidesLive Video » |
Jasper Snoek 🔗 |
Fri 12:00 p.m. - 12:15 p.m.
|
Surrogate-Assisted Evolutionary Multi-Objective Optimization for Hardware Design Space Exploration
(
Contributed Talk
)
SlidesLive Video » |
Renzhi Chen 🔗 |
Fri 12:15 p.m. - 12:45 p.m.
|
Coffee and Discussion
|
🔗 |
Fri 12:45 p.m. - 1:15 p.m.
|
Invited Talk: Paula Moraga
(
Invited Talk
)
SlidesLive Video » |
Paula Moraga 🔗 |
Fri 1:15 p.m. - 1:30 p.m.
|
Sparse Bayesian Optimization
(
Contributed Talk
)
SlidesLive Video » |
Sulin Liu 🔗 |
Fri 1:30 p.m. - 1:45 p.m.
|
Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations
(
Contributed Talk
)
SlidesLive Video » |
Andreas Besginow 🔗 |
Fri 1:45 p.m. - 2:45 p.m.
|
Poster Session
|
🔗 |
Fri 2:45 p.m. - 3:15 p.m.
|
Invited Talk: Carolina Osorio
(
Invited Talk
)
SlidesLive Video » |
Carolina Osorio 🔗 |
Fri 3:15 p.m. - 3:55 p.m.
|
Panel Discussion
(
Panel
)
SlidesLive Video » |
Jacob Gardner · Marta Blangiardo · Viacheslav Borovitskiy · Jasper Snoek · Paula Moraga · Carolina Osorio 🔗 |
Fri 3:55 p.m. - 4:00 p.m.
|
Closing Remarks
(
Remarks
)
|
🔗 |
-
|
Spatiotemporal modeling of European paleoclimate using doubly sparse Gaussian processes
(
Poster
)
Paleoclimatology—the study of past climate—is relevant beyond climate science itself, such as in archaeology and anthropology for understanding past human dispersal. Information about the Earth's paleoclimate comes from simulations of physical and biogeochemical processes and from proxy records found in naturally occurring archives. Climate-field reconstructions (CFRs) combine these data into a statistical spatial or spatiotemporal model. To date, there exists no consensus spatiotemporal paleoclimate model that is continuous in space and time, produces predictions with uncertainty, and can include data from various sources. A Gaussian process (GP) model would have these desired properties; however, GPs scale unfavorably with data of the magnitude typical for building CFRs. We propose to build on recent advances in sparse spatiotemporal GPs that reduce the computational burden by combining variational methods based on inducing variables with the state-space formulation of GPs. We successfully employ such a doubly sparse GP to construct a probabilistic model of European paleoclimate from the Last Glacial Maximum (LGM) to the mid-Holocene (MH) that synthesizes paleoclimate simulations and fossilized pollen proxy data. |
Seth Axen · Alexandra Gessner · Christian Sommer · Nils Weitzel · Álvaro Tejero-Cantero 🔗 |
-
|
Identifying latent climate signals using sparse hierarchical Gaussian processes
(
Poster
)
Extracting latent climate signals from multiple climate model simulations is important to estimate future climate change. To tackle this we develop a sparse hierarchical Gaussian process (SHGP), which probabilistically learns a latent distribution from a set of vectors. We use this to predict the latent surface temperature change globally and for central England from an ensemble of climate models, in a scalable manner and with robust uncertainty propagation. |
Matt Amos · Thomas Pinder · Paul Young 🔗 |
-
|
c-TPE: Generalizing Tree-structured Parzen Estimator with Inequality Constraints for Continuous and Categorical Hyperparameter Optimization
(
Poster
)
Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms. A widely-used versatile HPO method is a variant of Bayesian optimization called tree-structured Parzen estimator (TPE), which splits data into good and bad groups and uses the density ratio of those groups as an acquisition function (AF). However, real-world applications often have some constraints, such as memory requirements, or latency. In this paper, we present an extension of TPE to constraint optimization (c-TPE) via simple factorization of AFs. The experiments demonstrate c-TPE is robust to various constraint levels and exhibits the best average rank performance among existing methods with statistical significance on search spaces with categorical parameters on 81 settings. |
Shuhei Watanabe · Frank Hutter 🔗 |
-
|
Preferential Bayesian Optimization with Hallucination Believer
(
Poster
)
We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison. An important challenge in preferential BO, which uses the Gaussian process (GP) model to represent preference structure, is that the posterior distribution is computationally intractable. Existing preferential BO methods either suffer from poor posterior approximation ignoring the skewness or require computationally expensive approximation for the exact posterior represented as a skew GP. In this work, we develop a simple and computationally efficient preferential BO algorithm while keeping the superior optimization performance. The basic idea is to use a posterior additionally conditioned by a random sample from the original posterior itself, called hallucination, by which we show that a usual GP-based acquisition function can be used while reflecting the skewness of the original posterior. The numerical experiments on the various benchmark problems demonstrate the effectiveness of the proposed method. |
Shion Takeno · Masahiro Nomura · Masayuki Karasuyama 🔗 |
-
|
Symbolic-Model-Based Reinforcement Learning
(
Poster
)
We investigate using symbolic regression (SR) to model dynamics with mathematical expressions in model-based reinforcement learning (MBRL). While the primary promise of MBRL is to enable sample-efficient learning, most popular MBRL algorithms rely, in order to learn their approximate world model, on black-box over-parametrized neural networks, which are known to be data-hungry and are prone to overfitting in low-data regime. In this paper, we leverage the fact that a large collection of environments considered in RL is governed by physical laws that compose elementary operators e.g $\sin{},\sqrt{\phantom{x}}, \exp{}, \frac{\text{d}}{\text{dt}}$, and we propose to search a world model in the space of interpretable mathematical expressions with SR. We show empirically on simple domains that MBRL can benefit from the extrapolation capabilities and sample efficiency of SR compared to neural models.
|
Pierre-alexandre Kamienny · Sylvain Lamprier 🔗 |
-
|
An Active Learning Reliability Method for Systems with Partially Defined Performance Functions
(
Poster
)
In engineering design, one often wishes to calculate the probability that the performance of a system is satisfactory under uncertain or variable operating circumstances. State of the art algorithms exist to solve this problem using active learning with Gaussian process models. However, these algorithms cannot be applied to problems which often occur in the autonomous vehicle domain where the performance of a system may be undefined under certain circumstances. Naive modification of existing algorithms by simply masking undefined values will introduce a discontinuous system performance function, and would be unsuccessful because these algorithms are known to fail for discontinuous performance functions. We solve this problem using a hierarchical model for the system performance, where undefined performance is classified before the performance is regressed. This enables active learning Gaussian process methods to be applied to problems where the performance of the system is partially defined, and we demonstrate this by testing our methodology on synthetic numerical examples for the autonomous driving domain. |
Jonathan Sadeghi · Romain Mueller · John Redford 🔗 |
-
|
Multi-fidelity Bayesian experimental design using power posteriors
(
Poster
)
As experimental tools in the physical and life sciences become increasingly sophisticated and costly, there is a need to optimize the choice of experimental parameters to maximize the informativeness of the data and minimize cost. When designing a scientific experiment, an experimentalist often faces a choice among a suite of data collection modalities or instruments with varying fidelities and costs. Analyzing the tradeoff between high-fidelity, high-cost measurements and low-fidelity, low-cost measurements is often difficult due to complex data collection procedures and budget constraints. Here, we propose an approach for designing such experiments using Bayesian power posteriors, which naturally account for instruments with varying fidelities. Whereas existing approaches for multi-fidelity experimental design are often bespoke for particular data models and involve complicated inference schemes, our approach using power posteriors is generically applicable for any probabilistic model and straightforward to implement. We show that our approach can be combined with a model of experiment cost to allow for multi-fidelity experimental design. We demonstrate our approach through a series of simulated examples and an application to a genomics experiment. |
Andrew Jones · Diana Cai · Barbara Engelhardt 🔗 |
-
|
Bayesian Sequential Experimental Design for a Partially Linear Model with a Gaussian Process Prior
(
Poster
)
We study the problem of sequential experimental design to estimate the parametric component of a partially linear model with a Gaussian process prior.We consider an active learning setting where an experimenter adaptively decides which data to collect to achieve their goal efficiently.The experimenter's goals may vary, such as reducing the classification error probability or improving the accuracy of estimating the parameters of the data generating process.This study aims to improve the accuracy of estimating the parametric component of a partially linear model.Under some assumptions, the parametric component of a partially linear model can be regarded as a causal parameter, the average treatment effect (ATE) or the average causal effect (ACE).We propose a Bayesian sequential experimental design algorithm for a partially linear model with a Gaussian process prior, which is also considered as a sequential experimental design tailored to the estimation of ATE or ACE.We show the effectiveness of the proposed method through numerical experiments based on synthetic and semi-synthetic data. |
Shunsuke Horii 🔗 |
-
|
Fantasizing with Dual GPs in Bayesian Optimization and Active Learning
(
Poster
)
Gaussian Processes (GPs) are popular surrogate models for sequential decision making tasks such as Bayesian Optimization and Active Learning. Such frameworks often exploit well-known cheap methods for conditioning a GP posterior on new data. However, these standard methods cannot be applied to popular but more complex models such as sparse GPs or for non-conjugate likelihoods due to a lack of such update formulas. Using an alternative sparse Dual GP parameterization, we show that these costly computations can be avoided, whilst enjoying one-step updates for non-Gaussian likelihoods. The resulting algorithms allow for cheap batch formulations that work with most acquisition functions. |
Paul Chang · Prakhar Verma · ST John · Victor Picheny · Henry Moss · Arno Solin 🔗 |
-
|
Deep Gaussian Process-based Multi-fidelity Bayesian Optimization for Simulated Chemical Reactors
(
Poster
)
New manufacturing techniques such as 3D printing have recently enabled the creation of previously infeasible chemical reactor designs. Optimizing the geometry of the next generation of chemical reactors is important to understand the underlying physics and to ensure reactor feasibility in the real world. This optimization problem is computationally expensive, nonlinear, and derivative-free making it challenging to solve. In this work, we apply deep Gaussian processes (DGPs) to model multi-fidelity coiled-tube reactor simulations in a Bayesian optimization setting. By applying a multi-fidelity Bayesian optimization method, the search space of reactor geometries is explored through an amalgam of different fidelity simulations which are chosen based on prediction uncertainty and simulation cost, maximizing the use of computational budget. The use of DGPs provides an end-to-end model for five discrete mesh fidelities, enabling less computational effort to gain good solutions during optimization. The accuracy of simulations for these five fidelities is determined against experimental data obtained from a 3D printed reactor configuration, providing insights into appropriate hyper-parameters. We hope this work provides interesting insight into the practical use of DGP-based multi-fidelity Bayesian optimization for engineering discovery. |
Tom Savage · Nausheen Basha · Omar Matar · Antonio del Rio Chanona 🔗 |
-
|
Ice Core Dating using Probabilistic Programming
(
Poster
)
Ice cores record crucial information about past climate. However, before ice core data can have scientific value, the chronology must be inferred by estimating the age as a function of depth. Under certain conditions, chemicals locked in the ice display quasi-periodic cycles that delineate annual layers. Manually counting these noisy seasonal patterns to infer the chronology can be an imperfect and time-consuming process, and does not capture uncertainty in a principled fashion. In addition, several ice cores may be collected from a region, introducing an aspect of spatial correlation between them. We present an exploration of the use of probabilistic models for automatic dating of ice cores, using probabilistic programming to showcase its use for prototyping, automatic inference and maintainability, and demonstrate common failure modes of these tools. |
Aditya Ravuri · Tom Andersson · Ieva Kazlauskaite · William Tebbutt · Richard Turner · Scott Hosking · Neil Lawrence · Markus Kaiser 🔗 |
-
|
Scalable Gaussian Process Hyperparameter Optimization via Coverage Regularization
(
Poster
)
Gaussian processes (GPs) are Bayesian non-parametric models popular in a variety of applications due to their accuracy and native uncertainty quantification (UQ). Tuning GP hyperparameters is critical to ensure the validity of prediction accuracy and uncertainty; uniquely estimating multiple hyperparameters in, e.g. the Matern kernel can also be a significant challenge. Moreover, training GPs on large-scale datasets is a highly active area of research: traditional maximum likelihood hyperparameter training requires quadratic memory to form the covariance matrix and has cubic training complexity. To address the scalable hyperparameter tuning problem, we present a novel algorithm which estimates the smoothness and length-scale parameters in the Matern kernel in order to improve robustness of the resulting prediction uncertainties. Using novel loss functions similar to those in conformal prediction algorithms in the computational framework provided by the hyperparameter estimation algorithm MuyGPs, we achieve improved UQ over leave-one-out likelihood maximization while maintaining a high degree of scalability as demonstrated in numerical experiments. |
Killian Wood · Alec Dunton · Amanda Muyskens · Benjamin Priest 🔗 |
-
|
Integrated Fourier Features for Fast Sparse Variational Gaussian Process Regression
(
Poster
)
Sparse variational approximations are popular methods for scaling up inference in Gaussian processes to larger datasets. For $N$ training points, exact inference has $O(N^3)$ cost; with $M << N$ features, sparse variational methods have $O(NM^2)$ cost. Recently, methods have been proposed using harmonic features; when the domain is spherical, the resultant method has $O(M^3)$ cost, but in the common case of a Euclidean domain, previous methods do not avoid the $O(N)$ scaling and are generally limited to a fairly small class of kernels. In this work, we propose integrated Fourier features, with which we can obtain $O(M^3)$ cost, and the method can easily be applied to any covariance function for which we can easily evaluate the spectral density. We provide convergence results, and synthetic experiments showing practical performance gains.
|
Talay Cheema 🔗 |
-
|
Provably Reliable Large-Scale Sampling from Gaussian Processes
(
Poster
)
When comparing approximate Gaussian process (GP) models, it can be helpful to be able to generate data from any GP. If we are interested in how approximate methods perform at scale, we may wish to generate very large synthetic datasets to evaluate them. Na\"{i}vely doing so would cost (\order{n^3}) flops and (\order{n^2}) memory to generate a size (n) sample. We demonstrate how to scale such data generation to large (n) whilst still providing guarantees that, with high probability, the sample is indistinguishable from a sample from the desired GP. |
Anthony Stephenson · Robert Allison 🔗 |
-
|
Uncovering the short-time dynamics of electricity day-ahead markets
(
Poster
)
Obtaining a mathematical representation of electricity market prices is the cornerstone of the decision-making process in a liberalised landscape. Most of the existing models analyse the day-ahead electricity price as a univariate time series. This approach requires prior assumptions on the mathematical formulation to obtain an accurate representation of the time-evolution of electricity prices. We propose a new multivariate-stochastic-process model for the day-ahead prices, with each dimension of the process representing a single intraday time tick (ITT) auctioned in the day-ahead market. In this model, the electricity price at each ITT is the solution of a stochastic differential equation (so-called general Langevin equation) with the drift and diffusion terms to be learnt from historical data. The terms governing the stochastic differential equation are obtained by a kernel density estimation of the historical probability, to compute the expected values involved in the Kramers-Moyal definitions. The model is tested using data of the Spanish electricity day-ahead market, yielding a reliable representation of the electricity price structure.The price structure reveals the main features commonly discussed in electricity markets, such as the mean-reversion or equilibrium prices. Our results help us to understand the underlying short-time dynamics that govern the electricity day-ahead markets. |
Antonio Malpica-Morales · S KALLIADASIS · Miguel Durán Olivencia 🔗 |
-
|
Distributionally Robust Bayesian Optimization with φ-divergences
(
Poster
)
The study of robustness has received much attention due to its inevitability in data-driven settings where many systems face uncertainty. One such example of concern is Bayesian Optimization (BO), where uncertainty is multi-faceted, yet there only exists a limited number of works dedicated to this direction. In particular, there is the work of \cite{kirschner2020distributionally}, which bridges the existing literature of Distributionally Robust Optimization (DRO) by casting the BO problem from the lens of DRO. While this work is pioneering, it admittedly suffers from various practical shortcomings such as finite contexts assumptions, leaving behind the main question \textit{Can one devise a computationally tractable algorithm for solving this DRO-BO problem}? In this work, we tackle this question to a large degree of generality by considering robustness against data-shift in $\varphi$-divergences, which subsumes many popular choices, such as the $\chi^2$-divergence, Total Variation, and the extant Kullback-Leibler (KL) divergence. We show that the DRO-BO problem in this setting is equivalent to a finite-dimensional optimization problem which, even in the continuous context setting, can be easily implemented with provable sublinear regret bounds. We then show experimentally that our method surpasses existing methods, attesting to the theoretical results.
|
Hisham Husain · Vu Nguyen · Anton van den Hengel 🔗 |
-
|
Towards Improved Learning in Gaussian Processes: The Best of Two Worlds
(
Poster
)
Gaussian process training decomposes into inference of the (approximate) posterior and learning of the hyperparameters. For non-Gaussian (non-conjugate) likelihoods, two common choices for approximate inference are Expectation Propagation (EP) and Variational Inference (VI), which have complementary strengths and weaknesses. While VI's lower bound to the marginal likelihood is a suitable objective for inferring the approximate posterior, it does not automatically imply it is a good learning objective for hyperparameter optimization. We design a hybrid training procedure where the inference leverages conjugate-computation VI and the learning uses an EP-like marginal likelihood approximation. We empirically demonstrate on binary classification that this provides a good learning objective and generalizes better. |
Rui Li · ST John · Arno Solin 🔗 |
-
|
HyperBO+: Pre-training a universal hierarchical Gaussian process prior for Bayesian optimization
(
Poster
)
We present HyperBO+: a framework of pre-training a hierarchical Gaussian process that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and demonstrate its empirical success on challenging black-box function optimization problems with varied input dimensions and search spaces. |
Zhou Fan · Xinran Han · Zi Wang 🔗 |
-
|
Sequential Gaussian Processes for Online Learning of Nonstationary Functions
(
Poster
)
We propose a sequential Monte Carlo algorithm to fit infinite mixtures of GPs that capture non-stationary behavior while allowing for online, distributed inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the presence of non-stationarity in time-series data. To demonstrate the utility of our proposed online Gaussian process mixture-of-experts approach in applied settings, we show that we can successfully implement an optimization algorithm using online Gaussian process bandits. |
Michael Minyi Zhang · Bianca Dumitrascu · Sinead Williamson · Barbara Engelhardt 🔗 |
-
|
Gaussian Process Thompson sampling for Bayesian optimization of dynamic masking-based language model pre-training
(
Poster
)
We design and evaluate a Thompson sampling-based Bayesian optimization algorithm that leverages a Gaussian process reward model of the Masked Language Model (MLM) pre-training objective, for its sequential minimization. Transformer-based language model (TLM) pre-training requires large volumes of data and high computational resources, while introducing many unresolved design choices, such as hyperparameter selection of the pre-training procedure.We here fit TLM pre-training validation losses with a Gaussian process, and formulate a Thompson sampling bandit policy that maximizes its sequentially attained cumulative rewards. Instead of MLM pre-training with fixed masking probabilities, the proposed Gaussian process-based Thompson sampling (GP-TS) accelerates and improves MLM pre-training performance by sequentially selecting masking hyperparameters of the language model.GP-TS provides a fast and efficient framework for pre-training TLMs, as it attains better MLM pre-training loss in less epochs, avoiding costly hyperparameter selection techniques. |
Iñigo Urteaga · Moulay Zaidane Draidia · Tomer Lancewicki · Shahram Khadivi 🔗 |
-
|
Gaussian Process Regression for In-vehicle Disconnect Clutch Transfer Function Development
(
Poster
)
The advancement of Machine-learning (ML) methods such as Gaussian Process Regressions (GPR) have enabled the development and use of Reduced Order Models for complex automotive dynamic systems, as alternatives to conventional parametric methods or multi-dimensional look-up tables. GPR provides a mathematical framework for probabilistic representation of complex non-linear system. This paper discusses the use of GPR to characterize nonlinear dynamic behavior of an engine disconnect clutch used in a P2 hybrid propulsion architecture for efficient in-vehicle deployment, under computational and memory resources constraints. |
Huanyi Shui · Yijing Zhang · Deepthi Antony · devesh upadhyay · James McCallum · Yuji Fujii · Edward Dai 🔗 |
-
|
Variational Inference for Extreme Spatio-Temporal Matrix Completion
(
Poster
)
Missing data is a common problem in real-world sensor data collection. The performance of various approaches to impute data degrade rapidly in the extreme scenarios of low data sampling and noisy sampling, a case present in many real-world problems in the field of traffic sensing and environment monitoring, etc. However, jointly exploiting the spatiotemporal and periodic structure, which is generally not captured by classical matrix completion approaches, can improve the imputation performance of sensor data in such real-world conditions.We present a Bayesian approach toward spatiotemporal matrix completion wherein we estimate the underlying temporarily varying subspace using a Variational Bayesian technique. We jointly couple the low-rank matrix completion with the state space autoregressive framework along with a penalty function on the slowly varying subspace to model the temporal and periodic evolution in the data. We also propose a robust version of the above formulation, which improves the performance of imputation in the presence of outliers. Results demonstrate that the proposed method outperforms the recent state-of-the-art methods for real-world traffic and air pollution data. We demonstrate that fusing the subspace evolution over days can improve the imputation performance with even 15% of the data sampling. |
Charul Charul · Pravesh Biyani 🔗 |
-
|
Preprocessing Data of Varying Trial Duration with Linear Time Warping to Extend on the Applicability of SNP-GPFA
(
Poster
)
Signal-noise Poisson-spiking Gaussian Process Factor Analysis is a popular model for analyzing neuroscience data. However, a key limitation exists, in that it cannot be applied to data of varying trial duration, limiting the range of experiments that can be performed. This work proposes data preprocessing techniques to feature align uneven length spike data, as well as findings from the application of SNP-GPFA to transformed rodent V1 data. We find that stretching followed by linear time warping is sufficient to align rodent V1 data in time and with respect to a paired visual stimulus and reward feature for successful application of SNP-GPFA. |
Arjan Dhesi · Arno Onken 🔗 |
-
|
Are All Training Data Useful? A Empirical Revisit of Subset Selection in Bayesian Optimization
(
Poster
)
Bayesian optimization (BO) has been widely recognized as a powerful approach for black-box optimization problems with expensive objective function(s). Gaussian process (GP), which has been widely used for surrogate modeling in BO, is notorious for its cubic computational complexity grows with the increase of the amount of evaluated samples. This can lead to a significantly increased computational time for BO due to its sequential decision-making nature. This paper revisit the simple and effective subset selection methods to pick up a small group of representative data from the entire dataset to carry out the training and inference of GP in the context of BO. Empirical studies demonstrate that subset selection methods not only promote the performance of the vanilla BO but also significantly reduce the computational time for up to ≈ 98%. |
Peili Mao · Ke Li 🔗 |
-
|
Non-Gaussian Process Regression
(
Poster
)
Standard GPs offer a flexible modelling tool for well-behaved processes. However, deviations from Gaussianity are expected to appear in real world datasets, with structural outliers and shocks routinely observed. In these cases GPs can fail to model uncertainty adequately and may over-smooth inferences. Here we extend the GP framework into a new class of time-changed GPs that allow for straightforward modelling of heavy-tailed non-Gaussian behaviours, while retaining a tractable conditional GP structure through an infinite mixture of non-homogeneous GPs representation. The conditional GP structure is obtained by conditioning the observations on a latent transformed input space and the random evolution of the latent transformation is modelled using a L\'{e}vy process which allows Bayesian inference in both the posterior predictive density and the latent transformation function. We present Markov chain Monte Carlo inference procedures for this model and demonstrate the potential benefits compared to a standard GP. |
Yaman Kindap · Simon Godsill 🔗 |
-
|
Bayesian Spatial Clustered Regression for Count Value Data
(
Poster
)
Investigating relationships between response variables and covariates in environmental science, geoscience, and public health is an important endeavor. Based on a Bayesian mixture of finite mixtures model, we present a novel spatially clustered coefficients regression model for count value data. The proposed method detects the spatial homogeneity of the Poisson regression coefficients. A Markov random field constrained mixture of finite mixtures prior provides a regularized estimator of the number of clusters of regression coefficients with geographical neighborhood information. An efficient Markov chain Monte Carlo algorithm is developed using multivariate log gamma distribution as a base distribution. Simulation studies are carried out to examine the empirical performance of the proposed method. Finally, we analyze Georgia's premature death data as an illustration of the effectiveness of our approach. |
Peng Zhao · Hou-Cheng Yang · Dipak Dey · Guanyu Hu 🔗 |
-
|
Efficient Variational Gaussian Processes Initialization via Kernel-based Least Squares Fitting
(
Poster
)
Stochastic variational Gaussian processes (SVGP) scale Gaussian process inference up to large datasets through inducing points and stochastic training. However, the training process involves hard multimodal optimization, and often suffers from slow and suboptimal convergence by initializing inducing points directly from training data. We provide a better initialization of inducing points from kernel-based least squares fitting. We show empirically that our approach consistently reaches better prediction performance with much fewer training epochs. Our initialization saves up to 38% of the total time cost as compared to standard SVGP training. |
Xinran Zhu · David Bindel · Jacob Gardner 🔗 |
-
|
Variational Bayesian Inference and Learning for Continuous Switching Linear Dynamical Systems
(
Poster
)
Linear-Gaussian dynamical systems (LDSs) are computationally tractable because all latents and observations are jointly Gaussian. However, these systems are too restrictive to satisfactorily model many dynamical systems of interest. One generalization, the switching linear dynamical system (SLDS), trades analytic tractability for a more expressive model, allowing a discrete set of different linear regimes to model the data. Here we introduce a switching linear dynamical system with a continuum of linear regimes that are traversed continuously in time. We call this model a \textit{continuous switching linear dynamical system} (CSLDS) and derive efficient variational Bayesian methods for inference and model learning. |
Jack Goffinet · David Carlson 🔗 |
-
|
Adaptive Experimentation at Scale
(
Poster
)
In typical experimentation paradigms, reallocating measurement effort incurs high operational costs due to delayed feedback, and infrastructural and organizational difficulties. Challenges in reallocation lead practitioners to employ a few reallocation epochs in which outcomes are measured in large batches. Standard adaptive experimentation methods, however, do not scale to these regimes as they are tailored to perform well as the number of reallocation epochs grows. We develop a new adaptive experimentation framework that can flexibly handle any batch size and learns near-optimal designs when reallocation opportunities are few. By deriving an asymptotic sequential experiment based on normal approximations, we formulate a Bayesian dynamic program that can leverage prior information based on previous experiments. We propose policy gradient-based lookahead policies and find that despite relying on approximations, our methods greatly improve statistical power over uniform allocation and standard adaptive policies. |
Ethan Che · Hongseok Namkoong 🔗 |
-
|
Preference-Aware Constrained Multi-Objective Bayesian Optimization
(
Poster
)
Many analog circuit design optimization problems involve performing expensive simulations to evaluate circuit configurations in terms of multiple objectives and constraints; Oftentimes, practitioners have preferences over objectives. We aim to approximate the optimal Pareto set over feasible circuit configurations by minimizing the number of simulations. We propose a novel and efficient preference-aware constrained multi-objective Bayesian optimization (PAC-MOO) approach that learns surrogate models for objectives and constraints and sequentially selects candidate circuits for simulation that maximize the information gained about the optimal constrained Pareto-front while factoring in the objective preferences. Our experiments on real-world problems demonstrate PAC-MOO's efficacy over prior methods. |
Alaleh Ahmadianshalchi · Syrine Belakaria · Janardhan Rao Doppa 🔗 |
-
|
Imputation and forecasting for Multi-Output Gaussian Process in Smart Grid
(
Poster
)
Data imputation and prediction is a key component of intelligent upgrading of power systems. Data obtained from the real world may have varying degrees of missing data. These missing components have a significant impact on the outcome of the prediction model. In addition, the single-objective method lacks the ability to establish correlation models between multiple datasets, which can not improve the accuracy of data imputation and forecasting. To handle multi-output imputation and forecasting problems, this paper a novel kernel-based multi-output Gaussian process (MOGP) model to achieve data imputation and prediction simultaneously. |
JIANGJIAO XU · Ke Li 🔗 |
-
|
Shaping of Magnetic Field Coils in Fusion Reactors using Bayesian Optimisation
(
Poster
)
Nuclear fusion using magnetic confinement holds promise as a viable method for sustainable energy. However, most fusion devices have been experimental and as we move towards energy reactors, we are entering into a new paradigm of engineering. Curating a design for a fusion reactor is a high-dimensional multi-output optimisation process. Through this work we demonstrate a proof-of-concept of an AI-driven strategy to help explore the design search space and identify optimum parameters. By utilising a Multi-Output Bayesian Optimisation scheme, our strategy is capable of identifying the Pareto front associated with the optimisation of the toroidal field coil shape of a tokamak. The optimisation helps to identify design parameters that would minimise the costs incurred while maximising the plasma stability by way of minimising magnetic ripples. |
Timothy Nunn · Vignesh Gopakumar · Sebastien Kahn 🔗 |
-
|
Joint Point Process Model for Counterfactual Treatment--Outcome Trajectories Under Policy Interventions
(
Poster
)
Policy makers need to predict the progression of an outcome before adopting a new treatment policy, which defines when and how a sequence of treatments affecting the outcome occurs in continuous time. Commonly, algorithms that predict interventional future outcome trajectories take a fixed sequence of future treatments as input. This excludes scenarios where the policy is unknown or a counterfactual analysis is needed. To handle these limitations, we develop a joint model for treatments and outcomes, which allows for the estimation of treatment policies and effects from sequential treatment--outcome data. It can answer interventional and counterfactual queries about interventions on treatment policies, as we show with a realistic semi-synthetic simulation study. This abstract is based on work that is currently under review (Anonymous). |
Çağlar Hızlı · ST John · Anne Juuti · Tuure Saarinen · Kirsi Pietiläinen · Pekka Marttinen 🔗 |
-
|
PI is back! Switching Acquisition Functions in Bayesian Optimization
(
Poster
)
Bayesian Optimization (BO) is a powerful, sample-efficient technique to optimize expensive-to-evaluate functions. Each of the BO components, such as the surrogate model, the acquisition function (AF), or the initial design, is subject to a wide range of design choices.Selecting the right components for a given optimization task is a challenging task, which can have significant impact on the quality of the obtained results. In this work, we initiate the analysis of which AF to favor for which optimization scenarios. To this end, we benchmark SMAC3 using Expected Improvement (EI) and Probability of Improvement (PI) as acquisition functions on the 24 BBOB functions of the COCO environment. We compare their results with those of dynamic schedules which aim to use EI's explorative behavior in the early optimization steps, and then switch to PI for a better exploitation in the final steps. We also compare this to a random schedule and round-robin selection. We observe that dynamic schedules oftentimes outperform any single static one. Our results suggest that a schedule that allocates the first 25% of the optimization budget to EI and the last 75% to PI is a reliable default. However, we also observe considerable performance differences for the 24 functions, suggesting that a per-instance allocation, possibly learned on the fly, could offer significant improvement over the state-of-the-art BO designs. |
Carolin Benjamins · Elena Raponi · Anja Jankovic · Koen van der Blom · Maria Laura Santoni · Marius Lindauer · Carola Doerr 🔗 |
-
|
Actually Sparse Variational Gaussian Processes
(
Poster
)
In this work we propose a new class of inter-domain variational Gaussian process, constructed by projecting onto a set of compactly supported B-Spline basis functions. Our model is akin to variational Fourier features. However, due to the compact support of the B-Spline basis, we produce sparse covariance matrices. This enables us to make use of sparse linear algebra to efficiently compute matrix operations. After a one-off pre-computation, we show that our method reduces both the memory requirement and the per-iteration computational complexity to linear in the number of inducing points. |
Jake Cunningham · So Takao · Mark van der Wilk · Marc Deisenroth 🔗 |
-
|
Predicting Spatiotemporal Counts of Opioid-related Fatal Overdoses via Zero-Inflated Gaussian Processes
(
Poster
)
Recently, zero-inflated Gaussian processes (GPs) have been proposed as probabilistic machine learning models for observed spatio-temporal data that contain many close-to-zero entries. In this work, we extend zero-inflated GPs to sparse count data via the zero-inflated Poisson likelihood. This change no longer admits a closed-form computation of the training objective, so we use automatic differentiation variational inference to perform approximate posterior estimation. Our motivating application is the prediction of the number of opioid-related overdose deaths that will occur in the next 3 months in each of 1620 census tracts across the state of Massachusetts, given historical decedent data and socio-economic covariates. We find zero-inflated GPs can prioritize regions in need of near-term public health interventions better than alternative models at finer spatial and temporal resolutions than most prior efforts. Surprisingly, we find that this model is successful even when using Normal likelihoods instead of the zero-inflated Poisson. |
Kyle Heuton · Shikhar Shrestha · Thomas Stopka · Jennifer Pustz · · Michael Hughes 🔗 |
-
|
Expert Selection in Distributed Gaussian Processes: A Multi-label Classification Approach
(
Poster
)
By distributing the training process, local approximation reduces the cost of the standard Gaussian Process. An ensemble technique combines local predictions from Gaussian experts trained on different partitions of the data by assuming a perfect diversity of local predictors. Although it keeps the aggregation tractable, this assumption is often violated in practice. Taking dependencies between experts enables ensemble methods to provide consistent results. However, they have a high computational cost, which is cubic in the number of experts involved. By implementing an expert selection strategy, the final aggregation step uses fewer experts and is more efficient. Indeed, a static selection approach that assigns a fixed set of experts to each new data point cannot encode the specific properties of each unique data point. This paper proposes a flexible expert selection approach based on the characteristics of entry data points. To this end, we investigate the selection task as a multi-label classification problem where the experts define labels, and each entry point is assigned to some experts. The proposed solution's prediction quality, efficiency, and asymptotic properties are discussed in detail. We demonstrate the efficacy of our method through extensive numerical experiments using synthetic and real-world data sets. |
Hamed Jalali · Gjergji Kasneci 🔗 |
-
|
Statistical Downscaling of Sea Surface Temperature Projections with a Multivariate Gaussian Process Model
(
Poster
)
We developed a multivariate Gaussian process model to jointly analyze high-resolution remote sensing data and climate model output. With a basis function representation, the resulting model can achieve efficient computation and to describe potentially non-stationary spatial dependence. The predictive distribution provides statistical downscaling from the coarse-resolution climate model output, borrowing strength spatially and across high-resolution remote sensing data. We implement the proposed method for downscaling Sea Surface Temperature (SST) over the Great Barrier Reef (GBR). Our method reduces the mean squared predictive error by about 20% compared with the state of the art and produces a predictive distribution enabling holistic uncertainty quantification analyses. |
Ayesha Ekanayaka · Emily Kang · Peter Kalmus · Amy Braverman 🔗 |
-
|
Multi-Mean Gaussian Processes: A novel probabilistic framework for multi-correlated longitudinal data
(
Poster
)
See the uploaded file. |
Arthur Leroy · Mauricio A Álvarez 🔗 |
-
|
Spatiotemporal Residual Regularization with Kronecker Product Structure for Traffic Forecasting
(
Poster
)
Existing deep learning-based traffic forecasting models are often trained with MSE as the loss function, which is equivalent to assuming the residual/error to follow independent Gaussian distribution for simplicity of modeling. However, this assumption does not hold especially in traffic forecasting tasks, where the residuals are correlated in both spatial and temporal dimensions. For a multistep forecasting model, we would also expect the variance to increase with the number of steps. In this study, we propose a SpatioTemporal Residual Regularization based on the assumption that the residuals follow a zero-mean multivariate Gaussian distribution with a learnable spatiotemporal covariance matrix. This approach benefits from directly considering correlated spatiotemporal residuals. However, it suffers from scalability issues since the spatiotemporal covariance is often large. For model scalability, we model the spatiotemporal covariance as a sum of Kronecker products of spatial and temporal residual covariance, which significantly reduces the number of parameters and computation complexity. The performance of the proposed method is tested on a traffic speed forecasting task, and the results show that the proposed method improves the model performance by properly dealing with correlated residuals. |
Seongjin Choi · Nicolas Saunier · Martin Trepanier · Lijun Sun 🔗 |
-
|
Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning
(
Poster
)
Gaussian processes are Bayesian non-parametric models used in many areas. In this work, we propose a Non-stationary Heteroscedastic Gaussian process model which can be learned with gradient-based techniques. We demonstrate the interpretability of the proposed model by separating the overall uncertainty into aleatoric (irreducible) and epistemic (model) uncertainty. We illustrate the usability of derived epistemic uncertainty on active learning problems. We demonstrate the efficacy of our model with various ablations on multiple datasets. |
Zeel B Patel · Nipun Batra · Kevin Murphy 🔗 |
-
|
Active Learning with Convolutional Gaussian Neural Processes for Environmental Sensor Placement
(
Poster
)
Deploying environmental measurement stations can be a costly and time consuming procedure, especially in regions which are remote or otherwise difficult to access, such as Antarctica. Therefore, it is crucial that sensors are placed as efficiently as possible, maximising the informativeness of their measurements. Previous approaches for identifying salient placement locations typically model the data with a Gaussian process (GP; Williams and Rasmussen, 2006). However, designing a GP covariance which captures the complex behaviour of non-stationary spatiotemporal data is a difficult task. Further, the computational cost of these models make them challenging to scale to large environmental datasets. In this work, we explore using convolutional Gaussian neural processes (ConvGNPs; Bruinsma et al., 2021; Markou et al., 2022) to address these issues. A ConvGNP is a meta-learning model which uses a neural network to parameterise a GP predictive. Our model is data-driven, flexible, efficient, and permits gridded or off-grid input data. Using simulated surface temperature fields over Antarctica as ground truth, we show that a ConvGNP outperforms a simple GP baseline in terms of predictive performance. We then use the ConvGNP in a temperature sensor placement toy experiment, yielding promising results. |
Tom Andersson · Wessel Bruinsma · Efstratios Markou · Daniel C. Jones · Scott Hosking · James Requeima · Anna Vaughan · Anna-Louise Ellis · Matthew Lazzara · Richard Turner 🔗 |
-
|
Random Features Approximation for Fast Data-Driven Control
(
Poster
)
The goal of data-driven nonlinear control problems is to guarantee stability or safety of an unknown system. We consider a method based on Control Certificate Functions (CCFs) that uses Gaussian Process (GP) regression to learn unknown quantities for control affine dynamics. Computing the GP estimator can become prohibitively expensive for large datasets, which is an issue since speed is critical in real time control systems. We introduce a random feature approximation of the affine compound kernel to speed up training and prediction time. To ensure that the controller can be robust to these approximations, we provide an error analysis on the approximate mean and variance estimates.Finally, we propose a fast and robust convex optimization based min-norm controller using the error bounds and present preliminary experiments comparing the random features approximation to kernel methods. |
Kimia Kazemian · Sarah Dean 🔗 |
-
|
Deep Mahalanobis Gaussian Process
(
Poster
)
We propose a class of hierarchical Gaussian process priors in which each layer of the hierarchy controls the lengthscales of the next. While this has been explored, our proposal extends previous work on the Mahalanobis distance kernel bringing an alternative construction to non-stationary RBF-style kernels. This alternative take has more desirable theoretical properties restoring one of the interpretations for input-dependent lengthscales. More specifically, we interpret our model as a GP that performs locally linear non-linear dimensionality reduction. We directly compare it with compositional deep Gaussian process, a popular model that uses successive mappings to latent spaces to alleviate the burden of choosing a kernel function. Our experiments show promising results in synthetic and empirical datasets. |
Daniel Augusto de Souza · Diego Mesquita · César Lincoln Mattos · João Paulo Gomes 🔗 |
-
|
An Empirical Analysis of the Advantages of Finite vs.~Infinite Width Bayesian Neural Networks
(
Poster
)
Comparing Bayesian neural networks (BNNs) with different widths is challenging because, as the width increases, multiple model properties change simultaneously, and, inference in the finite width case is intractable. In this work, we empirically compare finite and infinite width BNNs, and provide quantitative and qualitative explanations for their performance difference. We find that under model mis-specification, increasing width can hurt BNN performance. In these cases, we provide evidence that finite BNNs generalize better partially due to the properties of their frequency spectrum that allows them to adapt under model mismatch. |
Jiayu Yao · Yaniv Yacoby · Beau Coker · Weiwei Pan · Finale Doshi-Velez 🔗 |
-
|
Non-exchangeability in Infinite Switching Linear Dynamical Systems
(
Poster
)
Complex nonlinear time-series data can be effectively modeled by Switching Linear Dynamical System (SLDS) models. In trying to allow for unbounded complexity in the discrete modes, most approaches have focused on Dirichlet Process mixture models. Such non-parametric Bayesian models restrict the distribution over dynamical modes to be exchangeable, making it difficult to capture important temporally and spatially sequential dependencies. In this work, we address these concerns by developing the non-exchangeable SLDS (neSLD) model class effectively extending infinite-capacity SLDS models to capture non-exchangeable distributions over dynamical mode partitions. Importantly, from this non-exchangeability, we can learn transition probabilities with infinite capacity that depend on observations or on the continuous latent states. We leverage partial differential equations (PDE) in the modeling of latent sufficient statistics to provide a Markovian formulation and support efficient dynamical mode updates. Finally, we demonstrate the flexibility and expressivity of our model class on synthetic data. |
Victor Geadah · Jonathan Pillow 🔗 |
-
|
Posterior Consistency for Gaussian Process Surrogate Models with Generalized Observations
(
Poster
)
Gaussian processes (GPs) are widely used as approximations to complex computational models. However, properties and implications of GP approximations on data analysis are not yet fully understood. In this work we study parameter inference in GP surrogate models that utilize generalized observations, and prove conditions and guarantees for the approximate parameter posterior to be consistent in terms of posterior expectations and KL-divergence. |
Rujian Chen · John Fisher III 🔗 |
-
|
Recommendations for Baselines and Benchmarking Approximate Gaussian Processes
(
Poster
)
We discuss the use of the sparse Gaussian process regression (SGPR) method introduced by Titsias (2009) as a baseline for approximate Gaussian processes. We make concrete recommendations to ensure that it is a strong baseline, ensuring that meaningful comparisons can be made. In doing so, we provide recommendations for comparing Gaussian process approximations, designed to explore both the limitations of methods as well as understand their computation-accuracy tradeoffs. This is particularly important now that highly accurate GP approximations are available, so that the literature provides a clear picture of currently achievable results. |
Sebastian Ober · David Burt · Artem Artemev · Mark van der Wilk 🔗 |
-
|
Challenges in Gaussian Processes for Non Intrusive Load Monitoring
(
Poster
)
Non-intrusive load monitoring (NILM) or energy disaggregation aims to break down total household energy consumption into constituent appliances. Prior work has shown that providing an energy breakdown can help people save up to 15 % of energy. In recent years, deep neural networks (deep NNs) have made remarkable progress in the domain of NILM. In this paper, we the performance and limitations of using Gaussian Processes for solving NILM. We choose GPs due to three main reasons: i) GPs inherently model uncertainty; ii) equivalence between infinite NNs and GPs; iii) by appropriately designing the kernel we can incorporate domain expertise. We find that vanilla GPs are not well-suited for NILM. |
Aadesh Desai · Gautam Vashishtha · Zeel B Patel · Nipun Batra 🔗 |
-
|
Multi-fidelity experimental design for ice-sheet simulation
(
Poster
)
Computer simulations are becoming an essential tool in many scientific fields from molecular dynamics to aeronautics. In glaciology, future predictions of sea level change require input from ice sheet models. Due to uncertainties in the forcings and the parameter choices for such models, many different realisations of the model are needed in order to produce probabilistic forecasts of sea level change. For these reasons, producing robust probabilistic forecasts from an ensemble of model simulations over regions of interest can be extremely expensive for many ice sheet models. Multi-fidelity experimental design (MFED) is a strategy that models the high-fidelity output of the simulator by combining information from various resolutions in an attempt to minimize the computational costs of the process and maximize the accuracy of the posterior. In this paper, we present an application of MFED to an ice-sheet simulatorand demonstrate potential computational savings by modelling the relationship between spatial resolutions. We also analyze the behavior of MFED strategies using theoretical results from sub-modular maximization. |
Pierre Thodoroff · Markus Kaiser · Rosie Williams · Robert Arthern · Scott Hosking · Neil Lawrence · Ieva Kazlauskaite 🔗 |
-
|
Sparse Bayesian Optimization
(
Poster
)
Bayesian optimization (BO) is a powerful approach to sample-efficient optimization of black-box objective functions. However, the application of BO to areas such as recommendation systems often requires taking the interpretability and simplicity of the configurations into consideration, a setting that has not been previously studied in the BO literature. To make BO applicable in this setting, we present several regularization-based approaches that allow us to discover sparse and more interpretable configurations. We propose a novel differentiable relaxation based on homotopy continuation that makes it possible to target sparsity by working directly with regularization. We identify failure modes for regularized BO and develop a hyperparameter-free method, sparsity exploring Bayesian optimization (SEBO) that seeks to simultaneously maximize a target objective and sparsity. SEBO and methods based on fixed regularization are evaluated on synthetic and real-world problems, and we show that we are able to efficiently optimize for sparsity. |
Sulin Liu · Qing Feng · David Eriksson · Ben Letham · Eytan Bakshy 🔗 |
-
|
Gaussian processes at the Helm(holtz): A better way to model ocean currents
(
Poster
)
Understanding the behavior of ocean currents has important practical applications. Since we expect current dynamics to be smooth but highly non-linear, Gaussian processes (GPs) offer an attractive model. In particular, one existing approach is to consider the velocities of the buoys as sparse observations of a vector field in two spatial dimensions and one time dimension. But we show that applying a GP, e.g. with a standard square exponential kernel, directly to this data fails to capture real-life current structure, such as continuity of currents and the shape of vortices. By contrast, these physical properties are captured by divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. So we propose instead to model these components with a GP directly. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate our method on real oceans data. |
Renato Berlinghieri · Tamara Broderick · Ryan Giordano · Tamay Ozgokmen · Kaushik Srinivasan · Brian Trippe · Junfei Xia 🔗 |
-
|
Surrogate-Assisted Evolutionary Multi-Objective Optimization for Hardware Design Space Exploration
(
Poster
)
Hardware design space exploration (DSE) aims to find a suitable micro-architecture for the dedicated hardware accelerators. It is a computationally expensive blackbox optimization problem with more than one conflicting performance indicator. Surrogate-assisted evolutionary algorithm is a promising framework for expensive multi-objective optimization problems given its surrogate modeling for handling expensive objective functions and population-based characteristics that search for a set of trade-off solutions simultaneously. However, most, if not all, existing studies mainly focus ‘regular’ Pareto-optimal fronts (PFs), whereas the PF is typically irregular in hardware DSE. In the meanwhile, the gradient information of the differentiable surrogate model(s) is beneficial to navigate a more effective exploration of the search space, but it is yet fully exploited. This paper proposes a surrogate-assisted evolutionary multi-objective optimization based on multiple gradient descent (MGD) for hardware DSE. Empirical results on both synthetic problems with irregular PFs and real-world hardware DSE cases fully demonstrate the effectiveness and outstanding performance of our proposed algorithm. |
Renzhi Chen · Ke Li 🔗 |
-
|
Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations
(
Poster
)
Data in many applications follows systems of Ordinary Differential Equations (ODEs). This paper presents a novel algorithmic and symbolic construction for covariance functions of Gaussian Processes (GPs) with realizations strictly following a system of linear homogeneous ODEs with constant coefficients, which we call LODE-GPs. Introducing this strong inductive bias into a GP improves modelling of such data. Using smith normal form algorithms, a symbolic technique, we overcome two current restrictions in the state of the art: (1) the need for certain uniqueness conditions in the set of solutions, typically assumed in classical ODE solvers and their probabilistic counterparts, and (2) the restriction to controllable systems, typically assumed when encoding differential equations in covariance functions. We show the effectiveness of LODE-GPs in a number of experiments. |
Andreas Besginow · Markus Lange-Hegermann 🔗 |