Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: Black in AI

DistillEmb: Distilling Word Embeddings via Contrastive Learning

Amanuel Mersha · Stephen Wu

Keywords: [ Natural Language Processing ]


Abstract:

Word embeddings powered the early days of neural network-based NLP research. Their effectiveness in small data regimes makes them still relevant in low-resource environments. However, they are limited in two critical ways: linearly increasing memory requirements and out-of-vocabulary token handling. In this work, we present a distillation technique of word embeddings into a CNN network using contrastive learning. This method allows embeddings to be regressed given the characters of a token. It is then used as a pretrained layer, replacing word embeddings. Low-resource languages are the primary beneficiary of this method and hence, we show its effectiveness on two morphology-rich Semitic languages, and in a multilingual NER task comprised of 10 African languages. The resulting model is a data efficient one that improves performance, lowers memory requirement and supports transfer of word representation out of the box.

Chat is not available.