Timezone: »

Avoiding Side Effects in Complex Environments
Alex Turner · Neale Ratzlaff · Prasad Tadepalli

Tue Dec 08 08:10 PM -- 08:20 PM (PST) @ Orals & Spotlights: Reinforcement Learning

Reward function specification can be difficult. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead while leading the agent to complete the specified task and avoid many side effects. Videos and code are available at https://avoiding-side-effects.github.io/.

Author Information

Alex Turner (Oregon State University)
Neale Ratzlaff (Oregon State University)

## Hi, I'm Neale I'm a 4th year Ph.D. candidate at Oregon State University studying machine (deep) learning. I'm interested in _Bayesian deep learning_, _reinforcement learning_, and _generative models_. Specifically, I am studying how best to fit a posterior of model parameters in high dimensional settings. I want to improve the short-comings in current Bayesian deep learning approaches, and applying improved methods for exploration in reinforcement leanring, and anomaly detection in vision tasks. I'm open to both internships and possible full-time positions.

Prasad Tadepalli (Oregon State University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors