Timezone: »

 
Adapting the Function Approximation Architecture in Online Reinforcement Learning
Fatima Davelouis · John Martin · Joseph Modayil · Michael Bowling

The performance of a reinforcement learning (RL) system depends on the compu- tational architecture used to approximate a value function. We propose an online RL algorithm for adapting a value function’s architecture and efficiently finding useful nonlinear features. The algorithm is evaluated in a spatial domain with high-dimensional, stochastic observations. Our method outperforms non-adaptive baseline architectures and approaches the performance of an architecture given side- channel information about observational structure. These results are a step towards scalable RL algorithms for more general problem settings, where observational structure is unavailable.

Author Information

Fatima Davelouis (University of Alberta)

I am a Master's student at the University of Alberta, working on reinforcement learning with Professor Michael Bowling. In my current research, I aim to build representations from an agent's predictions in POMDP settings.

John Martin (University of Alberta)
Joseph Modayil (DeepMind)
Michael Bowling (DeepMind / University of Alberta)

More from the Same Authors