Expo Workshop
Expo Workshop: Real World RL with Vowpal Wabbit: Beyond Contextual Bandits

Real World RL with Vowpal Wabbit: Beyond Contextual Bandits

John Langford · Marek Wydmuch · Maryam Majzoubi · Adith Swaminathan · · Dylan Foster · Paul Mineiro


In recent years, breakthroughs in sample-efficient RL algorithms like Contextual Bandits enabled new solutions to personalization and optimization scenarios. Unbiased off-policy evaluation gave Data Scientists superpowers on real-world data volumes, giving them confidence in putting machine learning into production. Vowpal Wabbit (https://vowpalwabbit.org) is an open source machine learning toolkit and research platform, used extensively across the industry, providing fast, scalable machine learning.

Dive beyond Contextual Bandits in the Real World: * Build Extreme Multilabel Classifiers with the Probabilistic Label Tree learner. * Solve multi-slot scenarios with Conditional Contextual Bandits and Slates, and optimize systems with Continuous Action-Space CB * Learn about advanced off-policy evaluation and introspection options with new estimators and visualizations