`

Timezone: »

 
LiRo: Benchmark and leaderboard for Romanian language tasks
Stefan Dumitrescu · Petru Rebeja · Beata Lorincz · Mihaela Gaman · Andrei Avram · Mihai Ilie · Andrei Pruteanu · Adriana Stan · Lorena Rosia · Cristina Iacobescu · Luciana Morogan · George Dima · Gabriel Marchidan · Traian Rebedea · Madalina Chitez · Dani Yogatama · Sebastian Ruder · Radu Tudor Ionescu · Razvan Pascanu · Viorica Patraucean

Recent advances in NLP have been sustained by the availability of large amounts of data and standardized benchmarks, which are not available for many languages. As a small step towards addressing this we propose LiRo, a platform for benchmarking models on the Romanian language on nine standard tasks: text classification, named entity recognition, machine translation, sentiment analysis, POS tagging, dependency parsing, language modelling, question-answering, and semantic textual similarity. We also include a less standard task of embedding debiasing, to address the growing concerns related to gender bias in language models. The platform exposes per-task leaderboards populated with baseline results for each task. In addition, we create three new datasets: one from Romanian Wikipedia and two by translating the Semantic Textual Similarity (STS) benchmark and the Cross-lingual Question Answering Dataset (XQuAD) into Romanian. We believe LiRo will not only add to the growing body of benchmarks covering various languages, but can also enable multi-lingual research by augmenting parallel corpora, and hence is of interest for the wider NLP community. LiRo is available at https://lirobenchmark.github.io/.

Author Information

Stefan Dumitrescu (Adobe Systems)

NLP

Petru Rebeja (Alexandru Ioan Cuza University of Iași)
Beata Lorincz (Babes-Bolyai University)
Mihaela Gaman
Andrei Avram
Mihai Ilie (Sustainalytics.com)
Andrei Pruteanu
Adriana Stan
Lorena Rosia
Cristina Iacobescu
Luciana Morogan
George Dima
Gabriel Marchidan (Technical University of Iasi)
Traian Rebedea (University Politehnica of Bucharest)
Madalina Chitez
Dani Yogatama (Google DeepMind)
Sebastian Ruder (DeepMind)
Radu Tudor Ionescu (University of Bucharest)
Razvan Pascanu (Google DeepMind)
Viorica Patraucean (DeepMind)

More from the Same Authors