Timezone: »
Agents trained via deep reinforcement learning (RL) routinely fail to generalize to unseen environments, even when these share the same underlying dynamics as the training levels. Understanding the generalization properties of RL is one of the challenges of modern machine learning. Towards this goal, we analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs) and formalize the dynamics of training levels as instances. We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training. Maximizing expected rewards impacts the learned belief state of the agent by inducing undesired instance-specific speed-running policies instead of generalizable ones, which are sub-optimal on the training set. We provide generalization bounds to the value gap in train and test environments based on the number of training instances, and use insights based on these to improve performance on unseen levels. We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance-specific exploitation. We experimentally validate our theory, observations, and the proposed computational solution over the CoinRun benchmark.
Author Information
Martin Bertran (Duke University)
I am a PhD student at Duke University. My main research interests are robustness, generalization, and representation learning. My work has focused on robustness in supervised learning in the context of fairness and Pareto efficiency, and on studying the characteristics of good representations for generalization in the context of reinforcement learning.
Natalia Martinez (Duke University)
Mariano Phielipp (Intel AI Labs)
Dr. Mariano Phielipp works at the Intel AI Lab inside the Intel Artificial Intelligence Products Group. His work includes research and development in deep learning, deep reinforcement learning, machine learning, and artificial intelligence. Since joining Intel, Dr. Phielipp has developed and worked on Computer Vision, Face Recognition, Face Detection, Object Categorization, Recommendation Systems, Online Learning, Automatic Rule Learning, Natural Language Processing, Knowledge Representation, Energy Based Algorithms, and other Machine Learning and AI-related efforts. Dr. Phielipp has also contributed to different disclosure committees, won an Intel division award related to Robotics, and has a large number of patents and pending patents. He has published on NeuriPS, ICML, ICLR, AAAI, IROS, IEEE, SPIE, IASTED, and EUROGRAPHICS-IEEE Conferences and Workshops.
Guillermo Sapiro (Duke University)
More from the Same Authors
-
2021 : Federating for Learning Group Fair Models »
Afroditi Papadaki · Natalia Martinez · Martin Bertran · Guillermo Sapiro · Miguel Rodrigues -
2021 : Distributionally Robust Group Backwards Compatibility »
Martin Bertran · Natalia Martinez · Guillermo Sapiro -
2021 : The Reflective Explorer: Online Meta-Exploration from Offline Data in Realistic Robotic Tasks »
Rafael Rafailov · · Tianhe Yu · Avi Singh · Mariano Phielipp · Chelsea Finn -
2021 : Computer Vision Analysis of Caregiver-Child Interactions in Children with Neurodevelopmental Disorders »
Dmitry Isaev · J. Matias Di Martino · Kimberley Carpenter · Guillermo Sapiro · geraldine Dawson -
2021 : The Reflective Explorer: Online Meta-Exploration from Offline Data in Realistic Robotic Tasks »
Rafael Rafailov · · Tianhe Yu · Avi Singh · Mariano Phielipp · Chelsea Finn -
2022 : Federated Fairness without Access to Demographics »
Afroditi Papadaki · Natalia Martinez · Martin Bertran · Guillermo Sapiro · Miguel Rodrigues -
2022 : Offline Policy Comparison with Confidence: Benchmarks and Baselines »
Anurag Koul · Mariano Phielipp · Alan Fern -
2022 : Group SELFIES: A Robust Fragment-Based Molecular String Representation »
Austin Cheng · Andy Cai · Santiago Miret · Gustavo Malkomes · Mariano Phielipp · Alan Aspuru-Guzik -
2022 : Conformer Search Using SE3-Transformers and Imitation Learning »
Luca Thiede · Santiago Miret · Krzysztof Sadowski · Haoping Xu · Mariano Phielipp · Alan Aspuru-Guzik -
2021 : Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision Quantization »
Santiago Miret · Vui Seng Chua · Mattias Marder · Mariano Phielipp · Nilesh Jain · Somdeb Majumdar -
2020 : Lightning Talk 2: Pareto Robustness for Fairness Beyond Demographics »
Natalia Martinez · Martin Bertran · Afroditi Papadaki · Miguel Rodrigues · Guillermo Sapiro -
2020 Poster: Language-Conditioned Imitation Learning for Robot Manipulation Tasks »
Simon Stepputtis · Joseph Campbell · Mariano Phielipp · Stefan Lee · Chitta Baral · Heni Ben Amor -
2020 Spotlight: Language-Conditioned Imitation Learning for Robot Manipulation Tasks »
Simon Stepputtis · Joseph Campbell · Mariano Phielipp · Stefan Lee · Chitta Baral · Heni Ben Amor -
2020 Poster: A Dictionary Approach to Domain-Invariant Learning in Deep Networks »
Ze Wang · Xiuyuan Cheng · Guillermo Sapiro · Qiang Qiu -
2019 : Poster Session I »
Shuangjia Zheng · Arnav Kapur · Umar Asif · Eyal Rozenberg · Cyprien Gilet · Oleksii Sidorov · Yogesh Kumar · Tom Van Steenkiste · William Boag · David Ouyang · Paul Jaeger · Sheng Liu · Aparna Balagopalan · Deepta Rajan · Marta Skreta · Nikhil Pattisapu · Jann Goschenhofer · Viraj Prabhu · Di Jin · Laura-Jayne Gardiner · Irene Li · sriram kumar · Qiyuan Hu · Mehul Motani · Justin Lovelace · Usman Roshan · Lucy Lu Wang · Ilya Valmianski · Hyeonwoo Lee · Sunil Mallya · Elias Chaibub Neto · Jonas Kemp · Marie Charpignon · Amber Nigam · Wei-Hung Weng · Sabri Boughorbel · Alexis Bellot · Lovedeep Gondara · Haoran Zhang · Taha Bahadori · John Zech · Rulin Shao · Edward Choi · Laleh Seyyed-Kalantari · Emily Aiken · Ioana Bica · Yiqiu Shen · Kieran Chin-Cheong · Subhrajit Roy · Ioana Baldini · So Yeon Min · Dirk Deschrijver · Pekka Marttinen · Damian Pascual Ortiz · Supriya Nagesh · Niklas Rindtorff · Andriy Mulyar · Katharina Hoebel · Martha Shaka · Pierre Machart · Leon Gatys · Nathan Ng · Matthias Hüser · Devin Taylor · Dennis Barbour · Natalia Martinez · Clara McCreery · Benjamin Eyre · Vivek Natarajan · Ren Yi · Ruibin Ma · Chirag Nagpal · Nan Du · Chufan Gao · Anup Tuladhar · Sam Shleifer · Jason Ren · Pouria Mashouri · Ming Yang Lu · Farideh Bagherzadeh-Khiabani · Olivia Choudhury · Maithra Raghu · Scott Fleming · Mika Jain · GUO YANG · Alena Harley · Stephen Pfohl · Elisabeth Rumetshofer · Alex Fedorov · Saloni Dash · Jacob Pfau · Sabina Tomkins · Colin Targonski · Michael Brudno · Xinyu Li · Yiyang Yu · Nisarg Patel -
2019 Poster: Goal-conditioned Imitation Learning »
Yiming Ding · Carlos Florensa · Pieter Abbeel · Mariano Phielipp -
2018 : Poster Session »
Phillipp Schoppmann · Patrick Yu · Valerie Chen · Travis Dick · Marc Joye · Ningshan Zhang · Frederik Harder · Olli Saarikivi · Théo Ryffel · Yunhui Long · Théo JOURDAN · Di Wang · Antonio Marcedone · Negev Shekel Nosatzki · Yatharth A Dubey · Antti Koskela · Peter Bloem · Aleksandra Korolova · Martin Bertran · Hao Chen · Galen Andrew · Natalia Martinez · Janardhan Kulkarni · Jonathan Passerat-Palmbach · Guillermo Sapiro · Amrita Roy Chowdhury