Affinity Workshop: Women in Machine Learning

Biomedical Word Sense Disambiguation with Contextualized Representation Learning

Mozhgan saeidi


Contextualized word embedding has been shown to carry useful semantic information to improve the final results of various Natural Language Processing (NLP) tasks. However, it is still challenging to integrate these embeddings with the information of the knowledge base. This integration is helpful in NLP tasks, specifically in the lexical ambiguity problem. Word Sense Disambiguation (WSD) is one of the main problems in the core of the Natural Language Processing domain. Text representation is a critical component of all WSD models, which encodes the text and information to find the best meaning to disambiguate the text. Contextual embedding representations of words are shown to successfully encode all different meanings of a word. In this work, we propose a new embedding approach that considers not only the information from the context, but also the information from the knowledge base. We present C-KASE (Contextualized Knowledge base Aware Sense Embedding), a novel approach to producing sense embeddings for the lexical meanings within a lexical knowledge base that lies in a comparable space to that of contextualized word vectors. C-KASE representations enable a simple 1-Nearest-Neighbor algorithm to perform state-of-the-art models in the English Word Sense Disambiguation task. Since this embedding is specified for each knowledge base, it also outperforms other similar tasks, i.e., Wikification and Named Entity Recognition. In our experiments, we provide proper settings for the C-KASE representation to be comparable in both supervised and knowledge-based approaches. The results of comparing our method with current state-of-the-art methods show the efficiency of our method.

Chat is not available.