Skip to yearly menu bar Skip to main content

Keynote Talk
Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference)

Data-Efficient Cross-Lingual Natural Language Processing

Barbara Plank


NLP today depends on huge amounts of unlabeled data, however, for many scenarios including low-resource languages and language varieties we do not have access to labeled resources and even unlabeled data might be scarce. In this talk, I will focus on data-efficient cross-lingual NLP. On the one side, I will outline methods on how to transfer models to low-resource languages. On the other side, I will argue for broader evaluation in cross-lingual learning to include dimensions of variation of language [1]. I'll showcase this on some of our recent work which includes NLP for Danish [4,2], cross-lingual task-oriented dialogues [2] and exploring genre as weak supervision signal for cross-lingual dependency parsing [3].
[1] Barbara Plank. What to do about non-standard (or non-canonical) language in NLP. In KONVENS 2016. Bochum, Germany.
[2] Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet Üstün, Marija Stepanović, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi and Barbara\ Plank. From Masked-Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding. In NAACL 2021.
[3] Max Müller-Eberstein, Rob van der Goot and Barbara Plank. Genre as Weak Supervision for Cross-lingual Dependency Parsing. In EMNLP 2021.
[4] Barbara Plank, Kristian Nørgaard Jensen and Rob van der Goot. DaN+ - Danish Nested Named Entities and Lexical Normalization. In COLING 2020.