Skip to yearly menu bar Skip to main content

Affinity Workshop: WiML Workshop 1

Model-Free Learning for Continuous Timing as an Action

Helen Zhou · David Childers · Zachary Lipton


In systems where RL algorithms can be readily integrated (e.g. robotics, gaming, and finance), there is often little inherent cost to making numerous observations or actions. However, there are also several real-world settings in which constant observational or interventional access to the system cannot be taken for granted. In this work, we propose a new setting in reinforcement learning: the timing-as-an-action setting. Here, agents choose not only the action that they normally would, but also a duration associated with that action. By augmenting existing policy gradient algorithms, we demonstrate how to jointly learn actions and their durations. Specifically, we create an additional policy network for the duration that takes both the action and the state observation as input. We consider several parameterizations of these durations, from discrete categorical distributions of varying granularity to different types of continuous distributions. Experiments are conducted on OpenAI simulators modified for the timing-as-an-action setting. Overall, we find that certain continuous parameterizations have significant advantages over discrete parameterization of durations, while others get stuck in local minima. More broadly, we note that the marginal benefit of learning durations likely depends on the nature of the environment and its sensitivity to small changes in timing.

Chat is not available.