Centering Low-Resource Languages and Cultures in the Age of Large Language Models
Abstract
Large Language Models (LLMs) have transformed NLP research and applications, yet they are still predominantly trained on high-resource, globally dominant languages. This imbalance leads to poor performance and limited applicability for low-resource languages, which are rich in tone, morphology, and cultural meaning. As a result, current AI systems risk reinforcing linguistic inequality, cultural erasure, and lack of accessibility in critical domains like education and healthcare.This workshop aims to reframe language technology by centering low-resource languages, cultures, and epistemologies in the age of LLMs. We seek to bring together researchers, linguists, developers, healthcare professionals, and technologists to share insights and develop strategies for building inclusive, culturally grounded, and linguistically robust language models. The workshop emphasizes collaboration across disciplines and regions to ensure both technical advancement and social relevance.Key areas of focus include developing LLM architectures tailored to low-resource linguistic features, ethical and community-centered dataset collection, and multilingual benchmarks designed specifically for underrepresented languages. We also highlight the importance of healthcare and medical machine translation to support equitable access to information and improve public health outcomes. Ultimately, this workshop aims to advance responsible AI innovation that empowers low-resource language communities and shapes a more inclusive future for global language technologies.