Affinity Workshop: WiML Workshop 1

Combining semantic search and twin product classification for recognition of purchasable items in voice shopping

Dieu Thu Le · Anna Weber


For virtual assistants like Alexa, Google Home the accuracy of the online shopping component via voice commands is particularly important and may have a great impact on customer trust. To ensure good customer experience, our work focuses on the problem of detecting if an utterance contains actual and purchasable products, thus referring to a shopping-related intent. A typical Spoken Language Understanding [4] architecture consists of an intent classifier and a slot detector. Intent classification identifies the user’s intent from a set of pre-defined intents and slot labeling extracts token sequences that are relevant for the fulfillment of the user’s request. For example, if the user says ‘Buy toilet paper’ the intent is BuyItem and the item slot is toilet paper. Buy is not important to fulfill the user’s request and is therefore not part of the slot. To understand if an item is purchasable on the connected e-commerce platform, one needs to check if the item is part of the platform’s product catalog. Searching through billions of products to check if a detected slot is a purchasable item is prohibitively expensive. To overcome this problem, we present a framework that (1) uses a retrieval module [3] that returns the most relevant products with respect to the detected slot, and (2) combines it with a twin network [1] [2] that decides if the detected slot is indeed a purchasable item or not. Figure 1 shows the architecture of the classifier. We show that the classifier outperforms a typical slot detector approach, with a gain of +81% in accuracy and +41% in F1 score. Passing the whole utterance on the left of the twin network instead of only the ItemName candidate and using an online contrastive loss function resulted in the best performance. For the retrieval module, we experimented with different numbers of matching products returned by semantic search and show that using the top five most relevant product names yields the best results.

Chat is not available.