Skip to yearly menu bar Skip to main content

Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad"

Balanced Off-Policy Evaluation for Personalized Pricing

Adam N. Elmachtoub · Vishal Gupta · YUNFAN ZHAO


We consider a feature-based pricing problem, where we have data consisting of feature information, historical pricing decisions, and binary realized demand. We wish to evaluate a new personalized pricing policy that map features to prices. This problem is known as off-policy evaluation and there is extensive literature on estimating the expected performance of the new policy. However, existing methods perform poorly when the logging policy has little exploration, which is common in pricing. We propose a novel method that exploits the special structure of pricing problems and incorporates downstream optimization problems when evaluating the new policy. We establish theoretical convergence guarantees, and we empirically demonstrate the advantage of our method using a real world pricing dataset.

Chat is not available.