Skip to yearly menu bar Skip to main content

Workshop: Adaptive Experimental Design and Active Learning in the Real World

Generalized Objectives in Adaptive Experimentation: The Frontier between Within- and Post-Experiment Objectives

Chao Qin · Daniel Russo


This paper formulates a generalized model of multi-armed bandit experiments that accommodates both cumulative regret minimization and best-arm identification objectives. We identify the optimal instance-dependent scaling of the cumulative cost across experimentation and deployment, which is expressed in the familiar form uncovered by Lai and Robbins (1985). We show that the nature of asymptotically efficient algorithms is nearly independent of the cost functions, emphasizing a remarkable universality phenomenon. Balancing various cost considerations is reduced to an appropriate choice of exploitation rate. Additionally, we explore the Pareto frontier between the length of experiment and the cumulative regret across experimentation and deployment. A notable and universal feature is that even a slight reduction in the exploitation rate from one results in a substantial decrease in the experiment's length, accompanied by only a minimal increase in the cumulative regret.

Chat is not available.