Skip to yearly menu bar Skip to main content



Abstract:

This paper studies an instance of the multi-armed bandit (\mab) problem, specifically where several causal \textsc{mab}s operate chronologically in the same dynamical system. Practically the reward distribution of each bandit is governed by the same non-trivial dependence structure, which is a dynamic causal model. Dynamic because we allow for each causal \mab to depend on the preceding \mab and in doing so are able to transfer information between both agents. Our contribution, the Chronological Causal Bandit (\ccb), is useful in discrete decision-making settings where the causal effects are changing across time and can be informed by earlier interventions in the same system. In this paper we present some early findings of the \ccb as demonstrated on a toy problem.

Chat is not available.