Timezone: »
In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman-based and policy gradient-based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.
Author Information
Kai Wang (Harvard University)
Sanket Shah (Harvard University)

I am a third-year PhD student at Harvard University advised by Prof. Milind Tambe. My current work focuses on Decision-Focused Learning, a paradigm for tailoring a predictive model for a downstream optimization task that uses its predictions.
Haipeng Chen (Dartmouth College)
Andrew Perrault (Harvard University)
Finale Doshi-Velez (Harvard)
Milind Tambe (Harvard University/Google Research India)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning »
Tue. Dec 7th 04:30 -- 06:00 PM Room
More from the Same Authors
-
2021 : Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation »
Ramtin Keramati · Omer Gottesman · Leo Celi · Finale Doshi-Velez · Emma Brunskill -
2021 : Your Bandit Model is Not Perfect: Introducing Robustness to Restless Bandits Enabled by Deep Reinforcement Learning »
Jackson Killian · Lily Xu · Arpita Biswas · Milind Tambe -
2022 : An Empirical Analysis of the Advantages of Finite vs.~Infinite Width Bayesian Neural Networks »
Jiayu Yao · Yaniv Yacoby · Beau Coker · Weiwei Pan · Finale Doshi-Velez -
2022 : Feature-Level Synthesis of Human and ML Insights »
Isaac Lage · Sonali Parbhoo · Finale Doshi-Velez -
2022 : What Makes a Good Explanation?: A Unified View of Properties of Interpretable ML »
Varshini Subhash · Zixi Chen · Marton Havasi · Weiwei Pan · Finale Doshi-Velez -
2022 : Case Study: Applying Decision Focused Learning in the Real World »
Shresth Verma · Aditya Mate · Kai Wang · Aparna Taneja · Milind Tambe -
2022 : What Makes a Good Explanation?: A Unified View of Properties of Interpretable ML »
Zixi Chen · Varshini Subhash · Marton Havasi · Weiwei Pan · Finale Doshi-Velez -
2022 : (When) Are Contrastive Explanations of Reinforcement Learning Helpful? »
Sanjana Narayanan · Isaac Lage · Finale Doshi-Velez -
2022 : Leveraging Human Features at Test-Time »
Isaac Lage · Sonali Parbhoo · Finale Doshi-Velez -
2022 : An Empirical Analysis of the Advantages of Finite v.s. Infinite Width Bayesian Neural Networks »
Jiayu Yao · Yaniv Yacoby · Beau Coker · Weiwei Pan · Finale Doshi-Velez -
2022 : Invited Talk: Milind Tambe »
Milind Tambe -
2022 : What Makes a Good Explanation?: A Unified View of Properties of Interpretable ML »
Varshini Subhash · Zixi Chen · Marton Havasi · Weiwei Pan · Finale Doshi-Velez -
2022 Poster: Addressing Leakage in Concept Bottleneck Models »
Marton Havasi · Sonali Parbhoo · Finale Doshi-Velez -
2022 Poster: Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare »
Shengpu Tang · Maggie Makar · Michael Sjoding · Finale Doshi-Velez · Jenna Wiens -
2022 Poster: Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses »
Sanket Shah · Kai Wang · Bryan Wilder · Andrew Perrault · Milind Tambe -
2021 : Retrospective Panel »
Sergey Levine · Nando de Freitas · Emma Brunskill · Finale Doshi-Velez · Nan Jiang · Rishabh Agarwal -
2021 : LAF | Panel discussion »
Aaron Snoswell · Jake Goldenfein · Finale Doshi-Velez · Evi Micha · Ivana Dusparic · Jonathan Stray -
2021 : LAF | The Role of Explanation in RL Legitimacy, Accountability, and Feedback »
Finale Doshi-Velez -
2021 : Invite Talk Q&A »
Milind Tambe · Tejumade Afonja · Paula Rodriguez Diaz -
2021 : Invited talk #2: Finale Doshi-Velez »
Finale Doshi-Velez -
2021 : Invited Talk: AI for Social Impact: Results from Deployments for Public Health »
Milind Tambe -
2020 : Q/A and Panel Discussion for People-Earth with Dan Kammen and Milind Tambe »
Daniel Kammen · Milind Tambe · Giulio De Leo · Mayur Mudigonda · Surya Karthik Mukkavilli -
2020 : Batch RL Models Built for Validation »
Finale Doshi-Velez -
2020 : Q/A and Discussion »
Surya Karthik Mukkavilli · Mayur Mudigonda · Milind Tambe -
2020 : Milind Tambe »
Milind Tambe -
2020 : Panel »
Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar -
2020 : Q & A and Panel Session with Tom Mitchell, Jenn Wortman Vaughan, Sanjoy Dasgupta, and Finale Doshi-Velez »
Tom Mitchell · Jennifer Wortman Vaughan · Sanjoy Dasgupta · Finale Doshi-Velez · Zachary Lipton -
2020 Workshop: I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in probabilistic machine learning »
Jessica Forde · Francisco Ruiz · Melanie Fernandez Pradier · Aaron Schein · Finale Doshi-Velez · Isabel Valera · David Blei · Hanna Wallach -
2020 Poster: Incorporating Interpretable Output Constraints in Bayesian Neural Networks »
Wanqian Yang · Lars Lorch · Moritz Graule · Himabindu Lakkaraju · Finale Doshi-Velez -
2020 Spotlight: Incorporating Interpretable Output Constraints in Bayesian Neural Networks »
Wanqian Yang · Lars Lorch · Moritz Graule · Himabindu Lakkaraju · Finale Doshi-Velez -
2020 Poster: Automatically Learning Compact Quality-aware Surrogates for Optimization Problems »
Kai Wang · Bryan Wilder · Andrew Perrault · Milind Tambe -
2020 Spotlight: Automatically Learning Compact Quality-aware Surrogates for Optimization Problems »
Kai Wang · Bryan Wilder · Andrew Perrault · Milind Tambe -
2020 Poster: Collapsing Bandits and Their Application to Public Health Intervention »
Aditya Mate · Jackson Killian · Haifeng Xu · Andrew Perrault · Milind Tambe -
2020 Poster: Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs »
Jianzhun Du · Joseph Futoma · Finale Doshi-Velez -
2020 : Discussion Panel: Hugo Larochelle, Finale Doshi-Velez, Devi Parikh, Marc Deisenroth, Julien Mairal, Katja Hofmann, Phillip Isola, and Michael Bowling »
Hugo Larochelle · Finale Doshi-Velez · Marc Deisenroth · Devi Parikh · Julien Mairal · Katja Hofmann · Phillip Isola · Michael Bowling -
2019 : Panel - The Role of Communication at Large: Aparna Lakshmiratan, Jason Yosinski, Been Kim, Surya Ganguli, Finale Doshi-Velez »
Aparna Lakshmiratan · Finale Doshi-Velez · Surya Ganguli · Zachary Lipton · Michela Paganini · Anima Anandkumar · Jason Yosinski -
2019 : Invited talk #4 »
Finale Doshi-Velez -
2019 : Finale Doshi-Velez: Combining Statistical methods with Human Input for Evaluation and Optimization in Batch Settings »
Finale Doshi-Velez -
2018 : Finale Doshi-Velez »
Finale Doshi-Velez -
2018 : Panel on research process »
Zachary Lipton · Charles Sutton · Finale Doshi-Velez · Hanna Wallach · Suchi Saria · Rich Caruana · Thomas Rainforth -
2018 : Finale Doshi-Velez »
Finale Doshi-Velez -
2018 Poster: Human-in-the-Loop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale Doshi-Velez -
2018 Spotlight: Human-in-the-Loop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale Doshi-Velez -
2018 Poster: Representation Balancing MDPs for Off-policy Policy Evaluation »
Yao Liu · Omer Gottesman · Aniruddh Raghu · Matthieu Komorowski · Aldo Faisal · Finale Doshi-Velez · Emma Brunskill -
2017 : Panel Session »
Neil Lawrence · Finale Doshi-Velez · Zoubin Ghahramani · Yann LeCun · Max Welling · Yee Whye Teh · Ole Winther -
2017 : Finale Doshi-Velez »
Finale Doshi-Velez -
2017 : Automatic Model Selection in BNNs with Horseshoe Priors »
Finale Doshi-Velez -
2017 : Coffee break and Poster Session I »
Nishith Khandwala · Steve Gallant · Gregory Way · Aniruddh Raghu · Li Shen · Aydan Gasimova · Alican Bozkurt · William Boag · Daniel Lopez-Martinez · Ulrich Bodenhofer · Samaneh Nasiri GhoshehBolagh · Michelle Guo · Christoph Kurz · Kirubin Pillay · Kimis Perros · George H Chen · Alexandre Yahi · Madhumita Sushil · Sanjay Purushotham · Elena Tutubalina · Tejpal Virdi · Marc-Andre Schulz · Samuel Weisenthal · Bharat Srikishan · Petar Veličković · Kartik Ahuja · Andrew Miller · Erin Craig · Disi Ji · Filip Dabek · Chloé Pou-Prom · Hejia Zhang · Janani Kalyanam · Wei-Hung Weng · Harish Bhat · Hugh Chen · Simon Kohl · Mingwu Gao · Tingting Zhu · Ming-Zher Poh · Iñigo Urteaga · Antoine Honoré · Alessandro De Palma · Maruan Al-Shedivat · Pranav Rajpurkar · Matthew McDermott · Vincent Chen · Yanan Sui · Yun-Geun Lee · Li-Fang Cheng · Chen Fang · Sibt ul Hussain · Cesare Furlanello · Zeev Waks · Hiba Chougrad · Hedvig Kjellstrom · Finale Doshi-Velez · Wolfgang Fruehwirt · Yanqing Zhang · Lily Hu · Junfang Chen · Sunho Park · Gatis Mikelsons · Jumana Dakka · Stephanie Hyland · yann chevaleyre · Hyunwoo Lee · Xavier Giro-i-Nieto · David Kale · Michael Hughes · Gabriel Erion · Rishab Mehra · William Zame · Stojan Trajanovski · Prithwish Chakraborty · Kelly Peterson · Muktabh Mayank Srivastava · Amy Jin · Heliodoro Tejeda Lemus · Priyadip Ray · Tamas Madl · Joseph Futoma · Enhao Gong · Syed Rameel Ahmad · Eric Lei · Ferdinand Legros -
2017 : Contributed talk: Beyond Sparsity: Tree-based Regularization of Deep Models for Interpretability »
Mike Wu · Sonali Parbhoo · Finale Doshi-Velez -
2017 : Invited talk: The Role of Explanation in Holding AIs Accountable »
Finale Doshi-Velez -
2017 Poster: Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes »
Taylor Killian · Samuel Daulton · Finale Doshi-Velez · George Konidaris -
2017 Oral: Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes »
Taylor Killian · Samuel Daulton · Finale Doshi-Velez · George Konidaris -
2016 : BNNs for RL: A Success Story and Open Questions »
Finale Doshi-Velez -
2015 Workshop: Machine Learning From and For Adaptive User Technologies: From Active Learning & Experimentation to Optimization & Personalization »
Joseph Jay Williams · Yasin Abbasi Yadkori · Finale Doshi-Velez -
2015 : Data Driven Phenotyping for Diseases »
Finale Doshi-Velez -
2015 Poster: Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction »
Been Kim · Julie A Shah · Finale Doshi-Velez