Timezone: »

Machine Learning for Spoken Language Understanding and Interactions
Asli Celikyilmaz · Milica Gasic · Dilek Hakkani-Tur

Fri Dec 11 05:30 AM -- 03:30 PM (PST) @ 511 b
Event URL: http://slunips2015.wix.com/slunips2015 »

The emergence of virtual personal assistants such as SIRI, Cortana, Echo, and Google Now, is generating increasing interest in research in speech understanding and spoken interaction. However, whilst the ability of these agents to recognise conversational speech is maturing rapidly, their ability to understand and interact is still limited to a few specific domains, such as weather information, local businesses, and some simple chit-chat. Their conversational capabilities are not necessarily apparent to users. Interaction typically depends on handcrafted scripts and is often guided by simple commands. Deployed dialogue models do not fully make use of the large amount of data that these agents generate. Promising approaches that involve statistical models, big data analysis, representation of knowledge (hierarchical, relations, etc. ), utilising and enriching semantic graphs with natural language components, multi-modality, etc. are being explored in multiple communities, such as natural language processing (NLP), speech processing, machine learning (ML), and information retrieval. However, we are still only scratching the surface in this field. The aim of this workshop, therefore, is to bring together researchers interested in understanding and interaction in conversational agents, to discuss the challenges and new and emerging topics in machine learning which might lead to richer and more natural human-computer interaction.

Obtaining meaning from human natural language is a complex process. The potential range of topics is vast and even well-formed utterances can be syntactically and semantically ambiguous. Spontaneous conversational speech naturally contains grammatical errors, repetitions, disfluencies, partial words, and out of vocabulary words. Conducting intelligent conversation over multiple turns requires maintaining the dialogue state over time, dealing with errors that arise from the speech recogniser, determining an adequate dialogue strategy, estimating the quality of that strategy, and generating natural language responses.

Over the years many different approaches and models have been proposed (e.g. syntactic and semantic analysis of spoken text, hybrid models that use speech processing components as features for semantic analysis, learning representations for spoken text, contextual models, statistical models of dialogue). These methods have drawn inspiration from machine learning solutions e.g. sequence tagging, syntactic parsing, and language modelling, primarily because these tasks can be easily abstracted into machine learning formulations (e.g. structured prediction, dimensionality reduction, regression, classification, supervised or reinforcement learning). These representations have evolved into novel understanding models based on discriminative methods, Bayesian nonparametrics, neural networks, low rank/spectral techniques, and word/phrase/sentence level embeddings based on deep learning methods.

In dialogue modelling, methods based on partially observable Markov decision processes and reinforcement learning have enabled limited domain dialogue models to be built that are trainable from data, robust to noise, and adaptable to changes in the user or domain. Following success in other areas, neural networks have also been applied to different aspects of dialogue modelling yielding significant improvements. The problem remains however as to how to extend these models to exploit the huge datasets that users of virtual personal assistants generate, and thereby enable the richer and more reliable conversation that users expect. Problems in spoken language understanding and dialogue modelling are particularly appealing to those doing core ML research due to the high-dimensional nature of the spaces involved (both the data and the label spaces), the need to handle noise robustly and the availability of large amounts of unstructured data. But there are many other areas within spoken language understanding and dialogue modelling for conversational systems where the ML community is less involved and which remain relatively unexplored, such as semantics, open-domain dialogue models, multi-modal dialogue input and output, emotion recognition, finding relational structures, discourse and pragmatics analysis, multi-human understanding (meetings) and summarization, and cross lingual understanding. These areas continue to rely on linguistically-motivated but imprecise heuristics which may benefit from new machine learning approaches.

The goal of this workshop is to bring together both applied and theoretical researchers in spoken/natural language processing and machine learning to facilitate the discussion of new frameworks that can help advance modern conversational systems. Some key questions we will address include (but are not limited to):

* Representation/Optimization
How can ML help provide novel representations and models to capture the structure of spoken natural language especially considering spontaneous conversational speech?
What speech and NLP problems could benefit from new inference/optimization techniques?
* Data
In speech and NLP we typically have large amounts of less useful background data and small amounts of very useful in-domain data. Are current ML algorithms sufficient to gracefully deal with this problem? For example, can we harness non-dialogue data to build dialogue models?
While many speech and NLP problems depend mainly on static speech or text corpora, dialogue is unique in that the user provides an opportunity for learning on-line. Which non-intrusive methods can we use to engage the user is such a way that it leads to improvement of the dialogue models?
How can we design new ML paradigms (e.g., bootstrapping, semi-supervised learning) to address the lack of annotated data in complex structured prediction problems such as knowledge extraction and semantics?

* Scalability
So far ML-based dialogue systems have only tacked limited domains, how can we scale them to large open domains leveraging the semantic web?
How can we tackle "scalability bottlenecks" unique to natural language?

* Multi-lingual/Multi-human/Multi-modal conversation
Can adaptation methods be developed to build conversational understanding systems for low resource languages without going through rigorous annotation processes?
What technical challenges posed by multilinguality, lexical variation in social media, and nonstandard dialects are under-researched in ML?
What ML methods are needed for structural understanding of multi-human conversations?
What ML methods can we deploy to support multi-modal conversation?

Author Information

Asli Celikyilmaz (Microsoft)
Milica Gasic (University of Cambridge)
Dilek Hakkani-Tur (Microsoft Research)

More from the Same Authors