Timezone: »
Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (\textit{passive} non-stationarity), changes induced by interactions with the system itself (\textit{active} non-stationarity), or both (\textit{hybrid} non-stationarity). In this work, we take the first steps towards the fundamental challenge of on-policy and off-policy evaluation amidst structured changes due to active, passive, or hybrid non-stationarity. Towards this goal, we make a \textit{higher-order stationarity} assumption such that non-stationarity results in changes over time, but the way changes happen is fixed. We propose, OPEN, an algorithm that uses a double application of counterfactual reasoning and a novel importance-weighted instrument-variable regression to obtain both a lower bias and a lower variance estimate of the structure in the changes of a policy's past performances. Finally, we show promising results on how OPEN can be used to predict future performances for several domains inspired by real-world applications that exhibit non-stationarity.
Author Information
Yash Chandak (Stanford University)
Shiv Shankar (IIT Bombay)
Nathaniel Bastian (United States Military Academy)
A science, technology, engineering and mathematics (STEM) leader, researcher, and educator with specialization in algorithms, techniques, tools and technologies from operations research, data science, artificial intelligence, systems engineering, and applied economics to research, design, develop, and deploy solutions to enable the improvement and enhancement of decision-making in the national security (military, cyber and defense) domain.
Bruno da Silva (University of Massachusetts)
Emma Brunskill (Stanford University)
Philip Thomas (University of Massachusetts Amherst)
More from the Same Authors
-
2021 : Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation »
Ramtin Keramati · Omer Gottesman · Leo Celi · Finale Doshi-Velez · Emma Brunskill -
2022 : Optimization using Parallel Gradient Evaluations on Multiple Parameters »
Yash Chandak · Shiv Shankar · Venkata Gandikota · Philip Thomas · Arya Mazumdar -
2022 Workshop: Reinforcement Learning for Real Life (RL4RealLife) Workshop »
Yuxi Li · Emma Brunskill · MINMIN CHEN · Omer Gottesman · Lihong Li · Yao Liu · Zhiwei Tony Qin · Matthew Taylor -
2022 Poster: Oracle Inequalities for Model Selection in Offline Reinforcement Learning »
Jonathan N Lee · George Tucker · Ofir Nachum · Bo Dai · Emma Brunskill -
2022 Poster: Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits »
Tong Mu · Yash Chandak · Tatsunori Hashimoto · Emma Brunskill -
2022 Poster: Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data »
Allen Nie · Yannis Flet-Berliac · Deon Jordan · William Steenbergen · Emma Brunskill -
2022 Poster: Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2021 : Q&A for Philip Thomas »
Philip Thomas -
2021 : Advances in (High-Confidence) Off-Policy Evaluation »
Philip Thomas -
2021 : Retrospective Panel »
Sergey Levine · Nando de Freitas · Emma Brunskill · Finale Doshi-Velez · Nan Jiang · Rishabh Agarwal -
2021 : Invited Speaker Panel »
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker -
2021 : Safe RL Debate »
Sylvia Herbert · Animesh Garg · Emma Brunskill · Aleksandra Faust · Dylan Hadfield-Menell -
2021 Poster: Play to Grade: Testing Coding Games as Classifying Markov Decision Process »
Allen Nie · Emma Brunskill · Chris Piech -
2021 Poster: Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes »
HyunJi Alex Nam · Scott Fleming · Emma Brunskill -
2021 Poster: SOPE: Spectrum of Off-Policy Estimators »
Christina Yuan · Yash Chandak · Stephen Giguere · Philip Thomas · Scott Niekum -
2021 Poster: Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs »
harsh satija · Philip Thomas · Joelle Pineau · Romain Laroche -
2021 Poster: Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin J Wainwright · Emma Brunskill -
2021 Poster: Universal Off-Policy Evaluation »
Yash Chandak · Scott Niekum · Bruno da Silva · Erik Learned-Miller · Emma Brunskill · Philip Thomas -
2021 Poster: Structural Credit Assignment in Neural Networks using Reinforcement Learning »
Dhawal Gupta · Gabor Mihucz · Matthew Schlegel · James Kostas · Philip Thomas · Martha White -
2021 Poster: Design of Experiments for Stochastic Contextual Linear Bandits »
Andrea Zanette · Kefan Dong · Jonathan N Lee · Emma Brunskill -
2020 : Counterfactuals and Offline RL »
Emma Brunskill -
2020 : Q & A and Panel Session with Dan Weld, Kristen Grauman, Scott Yih, Emma Brunskill, and Alex Ratner »
Kristen Grauman · Wen-tau Yih · Alexander Ratner · Emma Brunskill · Douwe Kiela · Daniel S. Weld -
2020 : Panel »
Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar -
2020 : Mini-panel discussion 1 - Bridging the gap between theory and practice »
Aviv Tamar · Emma Brunskill · Jost Tobias Springenberg · Omer Gottesman · Daniel Mankowitz -
2020 : Keynote: Emma Brunskill »
Emma Brunskill -
2020 : Panel discussion on minimizing bias in machine learning in education »
Neil Heffernan · Osonde A. Osoba · Emma Brunskill · Kathi Fisler -
2020 Poster: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Spotlight: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Poster: Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding »
Hongseok Namkoong · Ramtin Keramati · Steve Yadlowsky · Emma Brunskill -
2020 Poster: Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms »
Pinar Ozisik · Philip Thomas -
2020 Poster: Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2020 Poster: Provably Good Batch Reinforcement Learning Without Great Exploration »
Yao Liu · Adith Swaminathan · Alekh Agarwal · Emma Brunskill -
2019 : Emma Brünskill, "Some Theory RL Challenges Inspired by Education" »
Emma Brunskill -
2019 : Invited Talk »
Emma Brunskill -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Offline Contextual Bandits with High Probability Fairness Guarantees »
Blossom Metevier · Stephen Giguere · Sarah Brockman · Ari Kobren · Yuriy Brun · Emma Brunskill · Philip Thomas -
2019 Poster: Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model »
Andrea Zanette · Mykel J Kochenderfer · Emma Brunskill -
2019 Poster: Limiting Extrapolation in Linear Approximate Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2019 Poster: A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning »
Francisco Garcia · Philip Thomas -
2018 : Lunch & Posters »
Haytham Fayek · German Parisi · Brian Xu · Pramod Kaushik Mudrakarta · Sophie Cerf · Sarah Wassermann · Davit Soselia · Rahaf Aljundi · Mohamed Elhoseiny · Frantzeska Lavda · Kevin J Liang · Arslan Chaudhry · Sanmit Narvekar · Vincenzo Lomonaco · Wesley Chung · Michael Chang · Ying Zhao · Zsolt Kira · Pouya Bashivan · Banafsheh Rafiee · Oleksiy Ostapenko · Andrew Jones · Christos Kaplanis · Sinan Kalkan · Dan Teng · Xu He · Vincent Liu · Somjit Nath · Sungsoo Ahn · Ting Chen · Shenyang Huang · Yash Chandak · Nathan Sprague · Martin Schrimpf · Tony Kendall · Jonathan Richard Schwarz · Michael Li · Yunshu Du · Yen-Chang Hsu · Samira Abnar · Bo Wang -
2018 Poster: Representation Balancing MDPs for Off-policy Policy Evaluation »
Yao Liu · Omer Gottesman · Aniruddh Raghu · Matthieu Komorowski · Aldo Faisal · Finale Doshi-Velez · Emma Brunskill -
2018 Demonstration: Automatic Curriculum Generation Applied to Teaching Novices a Short Bach Piano Segment »
Emma Brunskill · Tong Mu · Karan Goel · Jonathan Bragg -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Sample efficiency and off policy hierarchical RL (Emma Brunskill) »
Emma Brunskill -
2017 : Emma Brunskill (Stanford) »
Emma Brunskill -
2017 : Invited Talk »
Emma Brunskill -
2017 Poster: Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation »
Zhaohan Guo · Philip S. Thomas · Emma Brunskill -
2017 Poster: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Spotlight: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Tutorial: Reinforcement Learning with People »
Emma Brunskill -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris -
2013 Poster: Projected Natural Actor-Critic »
Philip Thomas · William C Dabney · Stephen Giguere · Sridhar Mahadevan -
2011 Poster: TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning »
George Konidaris · Scott Niekum · Philip Thomas -
2011 Poster: Policy Gradient Coagent Networks »
Philip Thomas