Timezone: »
While there has been extensive work on learning from offline data for contextual multi-armed bandit settings, existing methods typically assume there is no environment shift: that the learned policy will operate in the same environmental process as that of data collection. However, this assumption may limit the use of these methods for many practical situations where there may be distribution shifts. In this work we propose Factored Distributionally Robust Optimization (Factored-DRO), which is able to separately handle distribution shifts in the context distribution and shifts in the reward generating process. Prior work that either ignores potential shifts in the context, or considers them jointly, can lead to performance that is too conservative, especially under certain forms of reward feedback. Our Factored-DRO objective mitigates this by considering the shifts separately, and our proposed estimators are consistent and converge asymptotically. We also introduce a practical algorithm and demonstrate promising empirical results in environments based on real-world datasets, such as voting outcomes and scene classification.
Author Information
Tong Mu (OpenAI)
Yash Chandak (Stanford University)
Tatsunori Hashimoto (Stanford)
Emma Brunskill (Stanford University)
More from the Same Authors
-
2021 : Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation »
Ramtin Keramati · Omer Gottesman · Leo Celi · Finale Doshi-Velez · Emma Brunskill -
2022 : Optimization using Parallel Gradient Evaluations on Multiple Parameters »
Yash Chandak · Shiv Shankar · Venkata Gandikota · Philip Thomas · Arya Mazumdar -
2022 : A Closer Look at the Calibration of Differential Private Learners »
Hanlin Zhang · Xuechen (Chen) Li · Prithviraj Sen · Salim Roukos · Tatsunori Hashimoto -
2022 : Out-of-Distribution Robustness via Targeted Augmentations »
Irena Gao · Shiori Sagawa · Pang Wei Koh · Tatsunori Hashimoto · Percy Liang -
2022 : Data Feedback Loops: Model-driven Amplification of Dataset Biases »
Rohan Taori · Tatsunori Hashimoto -
2022 : Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification »
Niladri S. Chatterji · Saminul Haque · Tatsunori Hashimoto -
2023 Poster: Likelihood-Based Diffusion Language Models »
Ishaan Gulrajani · Tatsunori Hashimoto -
2023 Poster: In-Context Decision-Making from Supervised Pretraining »
Jonathan N Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 Poster: Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan N Lee · Emma Brunskill -
2023 Poster: MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks »
Allen Nie · Yuhui Zhang · Atharva Shailesh Amdekar · Chris Piech · Tatsunori Hashimoto · Tobias Gerstenberg -
2023 Poster: Behavior Alignment via Reward Function Optimization »
Dhawal Gupta · Yash Chandak · Scott Jordan · Philip Thomas · Bruno da Silva -
2023 Poster: Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization »
Sanath Kumar Krishnamurthy · Ruohan Zhan · Susan Athey · Emma Brunskill -
2023 Poster: Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets »
Anirudhan Badrinath · Yannis Flet-Berliac · Allen Nie · Emma Brunskill -
2023 Poster: AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback »
Yann Dubois · Xuechen Li · Rohan Taori · Tianyi Zhang · Ishaan Gulrajani · Jimmy Ba · Carlos Guestrin · Percy Liang · Tatsunori Hashimoto -
2023 Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models »
Rebecca Roelofs · Fanny Yang · Hongseok Namkoong · Masashi Sugiyama · Jacob Eisenstein · Pang Wei Koh · Shiori Sagawa · Tatsunori Hashimoto · Yoonho Lee -
2022 : Data Feedback Loops: Model-driven Amplification of Dataset Biases »
Rohan Taori · Tatsunori Hashimoto -
2022 Workshop: Reinforcement Learning for Real Life (RL4RealLife) Workshop »
Yuxi Li · Emma Brunskill · MINMIN CHEN · Omer Gottesman · Lihong Li · Yao Liu · Zhiwei Tony Qin · Matthew Taylor -
2022 Poster: Oracle Inequalities for Model Selection in Offline Reinforcement Learning »
Jonathan N Lee · George Tucker · Ofir Nachum · Bo Dai · Emma Brunskill -
2022 Poster: When Does Differentially Private Learning Not Suffer in High Dimensions? »
Xuechen Li · Daogao Liu · Tatsunori Hashimoto · Huseyin A. Inan · Janardhan Kulkarni · Yin-Tat Lee · Abhradeep Guha Thakurta -
2022 Poster: Diffusion-LM Improves Controllable Text Generation »
Xiang Li · John Thickstun · Ishaan Gulrajani · Percy Liang · Tatsunori Hashimoto -
2022 Poster: Off-Policy Evaluation for Action-Dependent Non-stationary Environments »
Yash Chandak · Shiv Shankar · Nathaniel Bastian · Bruno da Silva · Emma Brunskill · Philip Thomas -
2022 Poster: Improving Self-Supervised Learning by Characterizing Idealized Representations »
Yann Dubois · Stefano Ermon · Tatsunori Hashimoto · Percy Liang -
2022 Poster: Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data »
Allen Nie · Yannis Flet-Berliac · Deon Jordan · William Steenbergen · Emma Brunskill -
2022 Poster: Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2021 : Retrospective Panel »
Sergey Levine · Nando de Freitas · Emma Brunskill · Finale Doshi-Velez · Nan Jiang · Rishabh Agarwal -
2021 : Panel: Future directions for tackling distribution shifts »
Tatsunori Hashimoto · Jamie Morgenstern · Judy Hoffman · Andrew Beck -
2021 : Safe RL Debate »
Sylvia Herbert · Animesh Garg · Emma Brunskill · Aleksandra Faust · Dylan Hadfield-Menell -
2021 Workshop: CtrlGen: Controllable Generative Modeling in Language and Vision »
Steven Y. Feng · Dor Arad Hudson · Tatsunori Hashimoto · DONGYEOP Kang · Varun Prashant Gangal · Anusha Balakrishnan · Joel Tetreault -
2021 Poster: Play to Grade: Testing Coding Games as Classifying Markov Decision Process »
Allen Nie · Emma Brunskill · Chris Piech -
2021 Poster: Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes »
HyunJi Alex Nam · Scott Fleming · Emma Brunskill -
2021 Poster: SOPE: Spectrum of Off-Policy Estimators »
Christina Yuan · Yash Chandak · Stephen Giguere · Philip Thomas · Scott Niekum -
2021 Poster: Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin J Wainwright · Emma Brunskill -
2021 Poster: Universal Off-Policy Evaluation »
Yash Chandak · Scott Niekum · Bruno da Silva · Erik Learned-Miller · Emma Brunskill · Philip Thomas -
2021 Poster: Design of Experiments for Stochastic Contextual Linear Bandits »
Andrea Zanette · Kefan Dong · Jonathan N Lee · Emma Brunskill -
2020 : Counterfactuals and Offline RL »
Emma Brunskill -
2020 : Q & A and Panel Session with Dan Weld, Kristen Grauman, Scott Yih, Emma Brunskill, and Alex Ratner »
Kristen Grauman · Wen-tau Yih · Alexander Ratner · Emma Brunskill · Douwe Kiela · Daniel S. Weld -
2020 : Panel »
Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar -
2020 : Mini-panel discussion 1 - Bridging the gap between theory and practice »
Aviv Tamar · Emma Brunskill · Jost Tobias Springenberg · Omer Gottesman · Daniel Mankowitz -
2020 : Keynote: Emma Brunskill »
Emma Brunskill -
2020 : Panel discussion on minimizing bias in machine learning in education »
Neil Heffernan · Osonde A. Osoba · Emma Brunskill · Kathi Fisler -
2020 Poster: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Spotlight: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Poster: Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding »
Hongseok Namkoong · Ramtin Keramati · Steve Yadlowsky · Emma Brunskill -
2020 Poster: Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2020 Poster: Provably Good Batch Reinforcement Learning Without Great Exploration »
Yao Liu · Adith Swaminathan · Alekh Agarwal · Emma Brunskill -
2019 : Emma Brünskill, "Some Theory RL Challenges Inspired by Education" »
Emma Brunskill -
2019 : Invited Talk »
Emma Brunskill -
2019 : Extended Poster Session »
Travis LaCroix · Marie Ossenkopf · Mina Lee · Nicole Fitzgerald · Daniela Mihai · Jonathon Hare · Ali Zaidi · Alexander Cowen-Rivers · Alana Marzoev · Eugene Kharitonov · Luyao Yuan · Tomasz Korbak · Paul Pu Liang · Yi Ren · Roberto Dessì · Peter Potash · Shangmin Guo · Tatsunori Hashimoto · Percy Liang · Julian Zubek · Zipeng Fu · Song-Chun Zhu · Adam Lerer -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Offline Contextual Bandits with High Probability Fairness Guarantees »
Blossom Metevier · Stephen Giguere · Sarah Brockman · Ari Kobren · Yuriy Brun · Emma Brunskill · Philip Thomas -
2019 Poster: Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model »
Andrea Zanette · Mykel J Kochenderfer · Emma Brunskill -
2019 Poster: Limiting Extrapolation in Linear Approximate Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2018 : Lunch & Posters »
Haytham Fayek · German Parisi · Brian Xu · Pramod Kaushik Mudrakarta · Sophie Cerf · Sarah Wassermann · Davit Soselia · Rahaf Aljundi · Mohamed Elhoseiny · Frantzeska Lavda · Kevin J Liang · Arslan Chaudhry · Sanmit Narvekar · Vincenzo Lomonaco · Wesley Chung · Michael Chang · Ying Zhao · Zsolt Kira · Pouya Bashivan · Banafsheh Rafiee · Oleksiy Ostapenko · Andrew Jones · Christos Kaplanis · Sinan Kalkan · Dan Teng · Xu He · Vincent Liu · Somjit Nath · Sungsoo Ahn · Ting Chen · Shenyang Huang · Yash Chandak · Nathan Sprague · Martin Schrimpf · Tony Kendall · Jonathan Richard Schwarz · Michael Li · Yunshu Du · Yen-Chang Hsu · Samira Abnar · Bo Wang -
2018 Poster: Representation Balancing MDPs for Off-policy Policy Evaluation »
Yao Liu · Omer Gottesman · Aniruddh Raghu · Matthieu Komorowski · Aldo Faisal · Finale Doshi-Velez · Emma Brunskill -
2018 Demonstration: Automatic Curriculum Generation Applied to Teaching Novices a Short Bach Piano Segment »
Emma Brunskill · Tong Mu · Karan Goel · Jonathan Bragg -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Sample efficiency and off policy hierarchical RL (Emma Brunskill) »
Emma Brunskill -
2017 : Emma Brunskill (Stanford) »
Emma Brunskill -
2017 : Invited Talk »
Emma Brunskill -
2017 Poster: Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation »
Zhaohan Guo · Philip S. Thomas · Emma Brunskill -
2017 Poster: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Spotlight: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Tutorial: Reinforcement Learning with People »
Emma Brunskill