Timezone: »
Reinforcement Learning has lead to considerable break-throughs in diverse areassuch as robotics, games and many others. But the application of RL to complex real world decision making problems remains limited. Many problems in Operations Management (inventory and revenue management, for example) are characterizedby large action spaces and stochastic system dynamics. These characteristicsmake the problem considerably harder to solve for existing RL methods thatrely on enumeration techniques to solve per step action problems. To resolvethese issues, we develop Programmable Actor Reinforcement Learning (PARL), apolicy iteration method that uses techniques from integer programming and sampleaverage approximation. Analytically, we show that the for a given critic, the learnedpolicy in each iteration converges to the optimal policy as the underlying samplesof the uncertainty go to infinity. Practically, we show that a properly selecteddiscretization of the underlying uncertain distribution can yield near optimal actorpolicy even with very few samples from the underlying uncertainty. We then applyour algorithm to real-world inventory management problems with complex supplychain structures and show that PARL outperforms state-of-the-art RL and inventoryoptimization methods in these settings. We find that PARL outperforms commonlyused base stock heuristic by 51.3% and RL based methods by up to 9.58% onaverage across different supply chain environments.
Author Information
Pavithra Harsha (IBM, International Business Machines)
Ashish Jagmohan (IBM Research)
Jayant Kalagnanam (IBM Research)
Brian Quanz (IBM Research)
Divya Singhvi (International Business Machines)
More from the Same Authors
-
2021 Poster: Predicting Deep Neural Network Generalization with Perturbation Response Curves »
Yair Schiff · Brian Quanz · Payel Das · Pin-Yu Chen -
2020 Poster: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees »
Haoran Zhu · Pavankumar Murali · Dzung Phan · Lam Nguyen · Jayant Kalagnanam -
2019 : Poster Session 2 »
Mayur Saxena · Nicholas Frosst · Vivien Cabannes · Gene Kogan · Austin Dill · Anurag Sarkar · Joel Ruben Antony Moniz · Vibert Thio · Scott Sievert · Lia Coleman · Frederik De Bleser · Brian Quanz · Jonathon Kereliuk · Panos Achlioptas · Mohamed Elhoseiny · Songwei Ge · Aidan Gomez · Jamie Brew -
2019 Poster: Differentially Private Distributed Data Summarization under Covariate Shift »
Kanthi Sarpatwar · Karthikeyan Shanmugam · Venkata Sitaramagiridharganesh Ganapavarapu · Ashish Jagmohan · Roman Vaculin