Affinity Workshop: Global South in AI

DistillEmb: Distilling Word Embeddings via Contrastive Learning

Amanuel Mersha

Keywords: [ Distillation ] [ word embedding ]

[ Abstract ] [ Project Page ]
[ OpenReview
presentation: Global South in AI
Mon 28 Nov 12:30 p.m. PST — 4 p.m. PST


Word embeddings powered the early days of neural network-based NLP research. Their effectiveness in small data regimes makes them still relevant in low-resource environments. However, they are limited in two critical ways: linearly increasing memory requirement based on the number of tokens and out-of-vocabulary token handling. In this work, we present a distillation technique of word embeddings into a CNN network using contrastive learning. This method allows embeddings to be regressed given the characters of a token. Low resource languages are the primary beneficiary of this method and hence, we show the effectiveness of such a model on two morphologically complex, Semitic languages and in a multilingual setting of 10 African languages. The resulting model utilizes a drastically smaller size of memory and handles out of vocabulary tokens sufficiently.

Chat is not available.