Poster
Tempo Adaptation in Non-stationary Reinforcement Learning
Hyunin Lee · Yuhao Ding · Jongmin Lee · Ming Jin · Javad Lavaei · Somayeh Sojoudi
Great Hall & Hall B1+B2 (level 1) #725
Abstract:
We first raise and tackle a time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time () rather than episode progress (), where wall-clock time signifies the actual elapsed time within the fixed duration . In existing works, at episode , the agent rolls a trajectory and trains a policy before transitioning to episode . In the context of the time-desynchronized environment, however, the agent at time allocates for trajectory generation and training, subsequently moves to the next episode at . Despite a fixed total number of episodes (), the agent accumulates different trajectories influenced by the choice of interaction times (), significantly impacting the suboptimality gap of the policy. We propose a Proactively Synchronizing Tempo () framework that computes a suboptimal sequence {} (= { }) by minimizing an upper bound on its performance measure, i.e., the dynamic regret. Our main contribution is that we show that a suboptimal {} trades-off between the policy training time (agent tempo) and how fast the environment changes (environment tempo). Theoretically, this work develops a suboptimal {} as a function of the degree of the environment's non-stationarity while also achieving a sublinear dynamic regret. Our experimental evaluation on various high-dimensional non-stationary environments shows that the framework achieves a higher online return at suboptimal {} than the existing methods.
Chat is not available.