Evaluating End-to-End Goal Oriented Dialog Systems
in
Workshop: Let's Discuss: Learning Methods for Dialogue
Abstract
Traditional dialog systems used in goal-oriented applications require a lot of domain-specific handcrafting, which hinders scaling up to new domains. End- to-end dialog systems, in which all components are trained from the dialogs themselves, escape this limitation. But the encouraging successes recently obtained in chit-chat dialog may not carry over to goal-oriented settings. In this talk, we will discuss how to evaluate end-to-end goal oriented dialog systems in a robust and reproducible manner. We will also present a new testbed designed to that end. On this new dataset, we show that an end-to-end dialog system based on Memory Networks can reach promising, yet imperfect, performance and learn to perform non-trivial operations. We confirm those results by comparing our system to a hand-crafted slot-filling baseline on data from the second Dialog State Tracking Challenge (Henderson et al., 2014a) and show similar result patterns on data extracted from an online concierge service.