Workshop
Indigenous in AI/ML
Mason Grimshaw · Andrea M. Delgado-Olson · Michael Running Wolf
Room 214
Indigenous In AI’s vision is to build an international community of Native, Aboriginal, and First Nations who will collectively transform their home communities with advanced technology. By elevating the voices of Indigenous ML researchers we will inspire future impactful work and break stereotypes. Additionally, this group will strive to educate the broader NeurIPS on contemporary indigenous issues relevant to information technology and practices.
Schedule
Mon 7:00 a.m. - 7:15 a.m.
|
Welcome and Introductions
(
Welcome and Introduction
)
>
SlidesLive Video Join us as we kick off the Indigenous in AI/ML Affinity Workshop! |
Andrea M. Delgado-Olson · Michael Running Wolf 🔗 |
Mon 7:15 a.m. - 7:30 a.m.
|
Lakota AI Code Camp
(
Presentation
)
>
SlidesLive Video An update on our annual Lakota AI Code Camp. The students continue to amaze us! |
Andrea Delgado-Olson 🔗 |
Mon 7:30 a.m. - 7:45 a.m.
|
Teach the Teacher Professional Development (T3PD)
(
Presentation
)
>
SlidesLive Video We're scaling AI/ML education to Native American students across North America. Come learn how! |
Andrea Delgado-Olson 🔗 |
Mon 7:45 a.m. - 8:00 a.m.
|
First Languages AI Initiative
(
Presentation
)
>
SlidesLive Video Join us as we announce the creation of the First Languages AI Reality Initiative. |
Michael Running Wolf 🔗 |
Mon 8:30 a.m. - 8:50 a.m.
|
An (unhelpful) guide to selecting the right ASR architecture for your under-resourced language
(
Talk
)
>
SlidesLive Video Advances in deep neural models for automatic speech recognition (ASR) have yielded dramatic improvements in ASR quality for resource-rich languages, with English ASR now achieving word error rates comparable to that of human transcribers. The vast majority of the world’s languages, however, lack the quantity of data necessary to approach this level of accuracy. In this paper we use four of the most popular ASR toolkits to train ASR models for eleven languages with limited ASR training resources: eleven widely spoken languages of Africa, Asia, and South America, one endangered language of Central America, and three critically endangered Indigenous languages of North America. We find that no single architecture consistently outperforms any other. These differences in performance so far do not appear to be related to any particular feature of the datasets or characteristics of the languages. These findings have important implications for future research in ASR for under-resourced languages. ASR systems for languages with abundant existing media and available speakers may derive the most benefit simply by collecting large amounts of additional acoustic and textual training data. Communities using ASR to support endangered language documentation efforts, who cannot easily collect more data, might instead focus on exploring multiple architectures and hyper-parameterizations to optimize performance within the constraints of their available data and resources. |
Robbie Jimerson 🔗 |
Mon 8:50 a.m. - 9:10 a.m.
|
An update on Automatic Speech Recognition in Hawaiian
(
Talk
)
>
SlidesLive Video Hawaiian is a low-resource language from the perspective of ASR (Automatic Speech Recognition). This talk will present efforts within the Hawaiian community to make efficient and responsible use of the existing data as well as to expand ASR resources through crowdsourcing. It will also discuss current thinking from Hawaiʻi on the idea of Data Sovereignty -- which is to keep a significant proportion of the data for training ASR systems in Hawaiian kapu, or available only to people within the Hawaiian community. |
Oiwi Parker Jones 🔗 |
Mon 9:10 a.m. - 9:30 a.m.
|
Eating the Buffalo: Language Revitalization using Automatic Speech Recognition (ASR)
(
Talk
)
>
SlidesLive Video Of the approximately 7,000 languages spoken today, nearly half of them are endangered. Automatic speech recognition (ASR) has the potential to mitigate endangerment, but ASR has high technical and resource requirements. A new organization dedicated to AI for language revitalization can help accelerate ASR's adoption for Indigenous language revitalization. |
Shawn Tsosie 🔗 |
Mon 9:30 a.m. - 10:00 a.m.
|
Language Revitalization Panel
(
Presentation
)
>
SlidesLive Video Join us as we explore Indigenous perspectives on language revitalization with our esteemed panelists. Moderated by Caroline Running Wolf. |
Caroline Running Wolf · Shawn Tsosie · Oiwi Parker Jones · Robbie Jimerson 🔗 |
Mon 12:15 p.m. - 1:00 p.m.
|
Reclaiming Indigenous Voices: An ASR-based Recording App for Indigenous Language Revitalization
(
Presentation
)
>
SlidesLive Video Nearly 50% of the world’s Indigenous and about 90% of North American languages are endangered, according to UNESCO. To stem this loss, Indigenous communities need a game-changing approach to language education. The First Languages AI Reality (FLAIR) initiative is being created to enable the next chapter in Indigenous language reclamation thanks to the use of advanced immersive AI technology. A significant obstacle in the path to Indigenous language reclamation and revitalization is the large amount of time and labor required for the transcription of speech into text. This bottleneck exists in part due to the scarcity of high quality, ethically-obtained data for the creation of ASR models. For this reason, a primary step in the FLAIR initiative is the creation of a user-friendly application, RAIV (Recording App for Indigenous Voices) that prioritizes data sovereignty and empowers Indigenous communities to record speech in their ancestral language. This increases the amount of high quality speech data for ASR models and additionally serves as an additional platform through which Indigenous communities can reclaim their language through education and speech. In this talk, Indigenous students from top universities who have been aiding in the creation of this application will discuss their work on the RAIV application and other aspects of the FLAIR initiative, what they have learned in the process, and where the application will go in the future. |
Ryan M Conti · Faith Baca 🔗 |
Mon 1:00 p.m. - 1:15 p.m.
|
Closing Remarks
(
Closing
)
>
SlidesLive Video |
Michael Running Wolf 🔗 |