Timezone: »
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure. We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function. This result encompasses the well-known settings of low-rank MDPs and, more generally, zero inherent Bellman error (also known as the Bellman closure assumption). We then demonstrate that this condition is also sufficient for these classes of problems by deriving a constant regret bound for two optimistic algorithms (LSVI-UCB and ELEANOR). Finally, we propose an algorithm for representation selection and we prove that it achieves constant regret when one of the given representations, or a suitable combination of them, satisfies the UNISOFT condition.
Author Information
Matteo Papini (Politecnico di Milano)
Matteo Papini was born in Sondrio, Italy, on 5th July 1993. In 2015 he obtained the Bachelor Degree in Ingegneria Informatica (Computer Engineering) cum laude at Politecnico di Milano. In 2017 he obtained the Master Degree in Computer Science and Engineering - Ingegneria Informatica cum laude at Politecnico di Milano. From November 2017 he is a Ph.D. student at Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) at Politecnico di Milano. His research interests include artificial intelligence, robotics, and machine learning, with a focus on reinforcement learning.
Andrea Tirinzoni (Inria Lille)
Aldo Pacchiano (Microsoft Research)
Marcello Restelli (Politecnico di Milano)
Alessandro Lazaric (INRIA)
Matteo Pirotta (Facebook AI Research)
More from the Same Authors
-
2021 Spotlight: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Spotlight: A Provably Efficient Sample Collection Strategy for Reinforcement Learning »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Spotlight: Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2021 : Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning »
Denis Yarats · Rob Fergus · Alessandro Lazaric · Lerrel Pinto -
2021 : Policy Optimization via Optimal Policy Evaluation »
Alberto Maria Metelli · Samuele Meta · Marcello Restelli -
2022 : Multi-Armed Bandit Problem with Temporally-Partitioned Rewards »
Giulia Romano · Andrea Agostini · Francesco Trovò · Nicola Gatti · Marcello Restelli -
2022 : Provably Efficient Causal Model-Based Reinforcement Learning for Environment-Agnostic Generalization »
Mirco Mutti · Riccardo De Santi · Emanuele Rossi · Juan Calderon · Michael Bronstein · Marcello Restelli -
2023 Poster: Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning »
Riccardo Zamboni · Alberto Maria Metelli · Marcello Restelli -
2023 Poster: A Unified Model and Dimension for Interactive Estimation »
Nataly Brukhim · Miro Dudik · Aldo Pacchiano · Robert Schapire -
2023 Poster: In-Context Decision-Making from Supervised Pretraining »
Jonathan N Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 Poster: Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan N Lee · Emma Brunskill -
2023 Poster: Anytime Model Selection in Linear Bandits »
Parnian Kassraie · Aldo Pacchiano · Nicolas Emmenegger · Andreas Krause -
2023 Poster: Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach »
Riccardo Poiani · Nicole Nobili · Alberto Maria Metelli · Marcello Restelli -
2022 Poster: Learning General World Models in a Handful of Reward-Free Deployments »
Yingchen Xu · Jack Parker-Holder · Aldo Pacchiano · Philip Ball · Oleh Rybkin · S Roberts · Tim Rocktäschel · Edward Grefenstette -
2022 Poster: Multi-Fidelity Best-Arm Identification »
Riccardo Poiani · Alberto Maria Metelli · Marcello Restelli -
2022 Poster: Best of Both Worlds Model Selection »
Aldo Pacchiano · Christoph Dann · Claudio Gentile -
2022 Poster: Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity »
Abhishek Gupta · Aldo Pacchiano · Yuexiang Zhai · Sham Kakade · Sergey Levine -
2022 Poster: Challenging Common Assumptions in Convex Reinforcement Learning »
Mirco Mutti · Riccardo De Santi · Piersilvio De Bartolomeis · Marcello Restelli -
2022 Poster: Off-Policy Evaluation with Deficient Support Using Side Information »
Nicolò Felicioni · Maurizio Ferrari Dacrema · Marcello Restelli · Paolo Cremonesi -
2022 Poster: Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees »
Andrea Tirinzoni · Matteo Papini · Ahmed Touati · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Near Optimal Policy Optimization via REPS »
Aldo Pacchiano · Jonathan N Lee · Peter Bartlett · Ofir Nachum -
2021 Poster: Local Differential Privacy for Regret Minimization in Reinforcement Learning »
Evrard Garcelon · Vianney Perchet · Ciara Pike-Burke · Matteo Pirotta -
2021 Poster: Learning in Non-Cooperative Configurable Markov Decision Processes »
Giorgia Ramponi · Alberto Maria Metelli · Alessandro Concetti · Marcello Restelli -
2021 Poster: On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 Poster: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Poster: A Provably Efficient Sample Collection Strategy for Reinforcement Learning »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Poster: Tactical Optimism and Pessimism for Deep Reinforcement Learning »
Ted Moskovitz · Jack Parker-Holder · Aldo Pacchiano · Michael Arbel · Michael Jordan -
2021 Poster: Neural Pseudo-Label Optimism for the Bank Loan Problem »
Aldo Pacchiano · Shaun Singh · Edward Chou · Alex Berg · Jakob Foerster -
2021 Poster: Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification »
Clémence Réda · Andrea Tirinzoni · Rémy Degenne -
2021 Poster: Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Effective Diversity in Population Based Reinforcement Learning »
Jack Parker-Holder · Aldo Pacchiano · Krzysztof M Choromanski · Stephen J Roberts -
2020 Poster: Model Selection in Contextual Stochastic Bandit Problems »
Aldo Pacchiano · My Phan · Yasin Abbasi Yadkori · Anup Rao · Julian Zimmert · Tor Lattimore · Csaba Szepesvari -
2020 Poster: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits »
Andrea Tirinzoni · Matteo Pirotta · Marcello Restelli · Alessandro Lazaric -
2020 Poster: Inverse Reinforcement Learning from a Gradient-based Learner »
Giorgia Ramponi · Gianluca Drappo · Marcello Restelli -
2020 Spotlight: Effective Diversity in Population Based Reinforcement Learning »
Jack Parker-Holder · Aldo Pacchiano · Krzysztof M Choromanski · Stephen J Roberts -
2020 Session: Orals & Spotlights Track 31: Reinforcement Learning »
Dotan Di Castro · Marcello Restelli -
2020 Poster: Adversarial Attacks on Linear Contextual Bandits »
Evrard Garcelon · Baptiste Roziere · Laurent Meunier · Jean Tarbouriech · Olivier Teytaud · Alessandro Lazaric · Matteo Pirotta -
2020 Poster: Improved Sample Complexity for Incremental Autonomous Exploration in MDPs »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2020 Oral: Improved Sample Complexity for Incremental Autonomous Exploration in MDPs »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2019 : Poster Session »
Ahana Ghosh · Javad Shafiee · Akhilan Boopathy · Alex Tamkin · Theodoros Vasiloudis · Vedant Nanda · Ali Baheri · Paul Fieguth · Andrew Bennett · Guanya Shi · Hao Liu · Arushi Jain · Jacob Tyo · Benjie Wang · Boxiao Chen · Carroll Wainwright · Chandramouli Shama Sastry · Chao Tang · Daniel S. Brown · David Inouye · David Venuto · Dhruv Ramani · Dimitrios Diochnos · Divyam Madaan · Dmitrii Krashenikov · Joel Oren · Doyup Lee · Eleanor Quint · elmira amirloo · Matteo Pirotta · Gavin Hartnett · Geoffroy Dubourg-Felonneau · Gokul Swamy · Pin-Yu Chen · Ilija Bogunovic · Jason Carter · Javier Garcia-Barcos · Jeet Mohapatra · Jesse Zhang · Jian Qian · John Martin · Oliver Richter · Federico Zaiter · Tsui-Wei Weng · Karthik Abinav Sankararaman · Kyriakos Polymenakos · Lan Hoang · mahdieh abbasi · Marco Gallieri · Mathieu Seurin · Matteo Papini · Matteo Turchetta · Matthew Sotoudeh · Mehrdad Hosseinzadeh · Nathan Fulton · Masatoshi Uehara · Niranjani Prasad · Oana-Maria Camburu · Patrik Kolaric · Philipp Renz · Prateek Jaiswal · Reazul Hasan Russel · Riashat Islam · Rishabh Agarwal · Alexander Aldrick · Sachin Vernekar · Sahin Lale · Sai Kiran Narayanaswami · Samuel Daulton · Sanjam Garg · Sebastian East · Shun Zhang · Soheil Dsidbari · Justin Goodwin · Victoria Krakovna · Wenhao Luo · Wesley Chung · Yuanyuan Shi · Yuh-Shyang Wang · Hongwei Jin · Ziping Xu -
2019 Poster: From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization »
Krzysztof M Choromanski · Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Vikas Sindhwani -
2019 Poster: Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs »
Jian QIAN · Ronan Fruit · Matteo Pirotta · Alessandro Lazaric -
2019 Poster: Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters »
Alberto Maria Metelli · Amarildo Likmeta · Marcello Restelli -
2019 Poster: Regret Bounds for Learning State Representations in Reinforcement Learning »
Ronald Ortner · Matteo Pirotta · Alessandro Lazaric · Ronan Fruit · Odalric-Ambrym Maillard -
2018 Poster: Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation »
Kush Bhatia · Aldo Pacchiano · Nicolas Flammarion · Peter Bartlett · Michael Jordan -
2018 Poster: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Spotlight: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Poster: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2018 Poster: Fighting Boredom in Recommender Systems with Linear Reinforcement Learning »
Romain WARLOP · Alessandro Lazaric · Jérémie Mary -
2018 Poster: Policy Optimization via Importance Sampling »
Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli -
2018 Poster: Transfer of Value Functions via Variational Methods »
Andrea Tirinzoni · Rafael Rodriguez Sanchez · Marcello Restelli -
2018 Spotlight: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2018 Oral: Policy Optimization via Importance Sampling »
Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli -
2018 Poster: Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric -
2018 Spotlight: Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric -
2017 Poster: Compatible Reward Inverse Reinforcement Learning »
Alberto Maria Metelli · Matteo Pirotta · Marcello Restelli -
2017 Poster: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2017 Poster: Adaptive Batch Size for Safe Policy Gradients »
Matteo Papini · Matteo Pirotta · Marcello Restelli -
2017 Spotlight: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2014 Poster: Sparse Multi-Task Reinforcement Learning »
Daniele Calandriello · Alessandro Lazaric · Marcello Restelli -
2013 Poster: Adaptive Step-Size for Policy Gradient Methods »
Matteo Pirotta · Marcello Restelli · Luca Bascetta -
2013 Poster: Sequential Transfer in Multi-armed Bandit with Finite Set of Models »
Mohammad Gheshlaghi azar · Alessandro Lazaric · Emma Brunskill -
2011 Poster: Transfer from Multiple MDPs »
Alessandro Lazaric · Marcello Restelli -
2007 Spotlight: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini -
2007 Poster: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini