Affinity Workshop: Global South in AI

Automated Misinformation: Mistranslation of news feed using multi-lingual translation systems in Facebook

Sundaraparipurnan Narayanan

Keywords: [ NLP ] [ multilingual translation ] [ mistranslation ] [ misinformation ]

[ Abstract ] [ Project Page ]
[ OpenReview
presentation: Global South in AI
Mon 28 Nov 12:30 p.m. PST — 4 p.m. PST


Machine translations have evolved over the past decade and are increasingly used in multiple applications. Increasingly, translation models focus on becoming multilingual, enabling translations across hundreds of languages, including many low-resource languages (e.g. Facebook's No Languages Left Behind can translate text from and to 200 languages). Facebook, in their announcement, also mentioned that NLLB would support 25 billion translations served daily on Facebook News Feed, Instagram, and other platforms. A Facebook user receives auto-translated (machine-translated) content on his news feed based on the language setting and translation preferences updated on the platform. Multilingual translation models are not free from errors. These errors are typically caused by a lack of adequate context or domain-specific words, ambiguity or sarcasm in the text, incorrect dialect, missing words, transliteration instead of translation, incorrect lexical choice, and differences in grammatical properties between languages. Such errors, may, on some occasions, lead to misinformation about the text translated. This paper examines instances of misinformation caused by mistranslations from English to Tamil in the Facebook news feed. For the purpose of the research, categories of news headlines were collected from multiple sources, including (a) General- news headlines dataset from Kaggle (30 samples), (b) Sarcastic - news headlines from Kaggle (10 samples), (c) Domain-specific -news headline from Wired (10 samples), and (d) Ambiguous headlines from linguistics page (15 samples). News headlines in each of these datasets were filtered for politics as a topic given the potential impact it may cause due to misinformation. From the filtered news headlines category database, samples were randomly identified for the purpose of the translation. Translations were undertaken on these samples using NLLB. A test code was created in Google Colab with NLLB pre-trained model (available on Huggingface). The translations were evaluated for mistranslations. Incomplete translations were eliminated (~27%) and translations that provided complete meaning (~73%) were examined for misinformation. A translation was classified as misinformation if it gives false information in whole or part of the news headline. For instance, “Trump For Rushing To Defend Tomi Lahren While Ignoring Real Victims” was translated as “உண்மையான பாதிக்கப்பட்டவர்களை புறக்கணித்த போதே டோமி லஹ்ரனை பாதுகாக்க துரிதமாக வந்ததற்காக டிரம்ப் <சுடப்பட்டார்>” (English meaning: Trump was for coming early to protect Tomi Lahren ignoring the real victims). This project assessed misinformation only from the perspective of the headline and not the whole article or news item that appears on the news feed. Conclusion:The results revealed that 20% of generic news headlines and ambiguous headlines, and 30% of sarcastic and domain-specific headlines were misinformation caused by mistranslation. It is necessary to be aware that languages are inequivalent, and errors or mistranslation, or misinformation caused by such mistranslation will remain even with extensive efforts to develop better models. However, the proportion of such mistakes will drop over the period, with the responsible implementation of such auto translations. An option to validate and provide feedback on the appropriateness of translation in the feed and a mechanism to cross-validate models for varied uses of translations are examples of responsible implementation efforts. It is pertinent to note that while benchmarks and evaluation methods (Translation Error Rate) can support understanding the extent of errors, they are not necessarily examined from the perspective of misinformation. Prior research has examined misinformation caused by mistranslation by evaluating the ability of post-editor to identify such instances. Auto-translation, on the other hand, does not have such post-editor opportunities. Recommendation:This paper brings the necessity to have a responsible implementation of auto-translation on social media, reflecting on the possible misinformation that it may cause. Hence, platforms like Facebook that provides such translations should establish responsible practices of implementing auto-translations to scale, including:a.Providing an opportunity to the user to provide feedback at every instance of such translation and b.Having a mechanism to precisely track posts containing such mistranslation to notify users of such misinformation. c. Establishing a robust cross-validation mechanism between translations in multiple languages to assess the potential for misinformation

Chat is not available.