Timezone: »

Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management
Pavithra Harsha · Ashish Jagmohan · Jayant Kalagnanam · Brian Quanz · Divya Singhvi
Event URL: https://openreview.net/forum?id=vKbY_WHriDA »

Reinforcement Learning has lead to considerable break-throughs in diverse areassuch as robotics, games and many others. But the application of RL to complex real world decision making problems remains limited. Many problems in Operations Management (inventory and revenue management, for example) are characterizedby large action spaces and stochastic system dynamics. These characteristicsmake the problem considerably harder to solve for existing RL methods thatrely on enumeration techniques to solve per step action problems. To resolvethese issues, we develop Programmable Actor Reinforcement Learning (PARL), apolicy iteration method that uses techniques from integer programming and sampleaverage approximation. Analytically, we show that the for a given critic, the learnedpolicy in each iteration converges to the optimal policy as the underlying samplesof the uncertainty go to infinity. Practically, we show that a properly selecteddiscretization of the underlying uncertain distribution can yield near optimal actorpolicy even with very few samples from the underlying uncertainty. We then applyour algorithm to real-world inventory management problems with complex supplychain structures and show that PARL outperforms state-of-the-art RL and inventoryoptimization methods in these settings. We find that PARL outperforms commonlyused base stock heuristic by 51.3% and RL based methods by up to 9.58% onaverage across different supply chain environments.

Author Information

Pavithra Harsha (IBM, International Business Machines)
Ashish Jagmohan (IBM Research)
Jayant Kalagnanam (IBM Research)
Brian Quanz (IBM Research)
Divya Singhvi (International Business Machines)

More from the Same Authors