San Diego Spotlight Poster
Imitation Beyond Expectation Using Pluralistic Stochastic Dominance
Ali Farajzadeh · Danyal Saeed · Syed M Abbas · Rushit Shah · Aadirupa Saha · Brian Ziebart
Exhibit Hall C,D,E #515
Imitation learning seeks policies reflecting the values of demonstrated behaviors. Prevalent approaches learn to match or exceed the demonstrator's performance in expectation without knowing the demonstrator’s reward function. Unfortunately, this does not induce pluralistic imitators that learn to support qualitatively distinct demonstrations. We reformulate imitation learning using stochastic dominance over the demonstrations' reward distribution across a range of reward functions as our foundational aim. Our approach matches imitator policy samples (or support) with demonstrations using optimal transport theory to define an imitation learning objective over trajectory pairs. We demonstrate the benefits of pluralistic stochastic dominance (PSD) for imitation in both theory and practice.
Live content is unavailable. Log in and register to view live content