Skip to yearly menu bar Skip to main content

Workshop: Adaptive Experimental Design and Active Learning in the Real World

Cross-Entropy Estimators for Sequential Experiment Design with Reinforcement Learning

Tom Blau · Iadine Chades · Amir Dezfouli · Daniel Steinberg · Edwin Bonilla


Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current methods rely on contrastive estimators of expected information gain, which require an exponential number of contrastive samples to achieve an unbiased estimation. We propose the use of an alternative lower bound estimator, based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method requires no contrastive samples, can achieve more accurate estimates of high information gains, allows learning of superior design policies, and is compatible with implicit probabilistic models. We assess our algorithm's performance in various tasks, including continuous and discrete designs and explicit and implicit likelihoods.

Chat is not available.