Nearly 50% of the world’s Indigenous and about 90% of North American languages are endangered, according to UNESCO. To stem this loss, Indigenous communities need a game-changing approach to language education. The First Languages AI Reality (FLAIR) initiative is being created to enable the next chapter in Indigenous language reclamation thanks to the use of advanced immersive AI technology.
Imagine an inclusive Metaverse where Indigenous youth across North America reconnect with their heritage. In this Metaverse Lakota hunters use their mother tongue to coordinate a bison hunt in the Great Plains. To the west, Makah canoes cross the Salish Sea to a community reunion with Kwak̓wala speakers and ask permission to come ashore on their Canoe Journey. We are convinced that Voice AI will be intrinsic to the Metaverse experience which in turn can facilitate intergenerational language transmission for Indigenous languages.
To achieve this vision we need to first develop the foundations for Indigenous Voice AI. This foundational automatic speech recognition (ASR) research will initially focus on the Wakashan language family spanning British Columbia, Canada and Washington State, USA. The long-term objective is to expand to other Indigenous languages in North America, such as Algonquin languages in the Northeast, eventually bringing Voice AI to Indigenous communities worldwide and positioning them to participate in the Metaverse through Voice AI in their heritage languages.
FLAIR’s goal is to develop a method for rapid creation of custom ASR models for Indigenous languages. The development of ASR for a new language or domain typically requires the collection of hundreds of hours of data. For many Indigenous languages, this is usually infeasible, because there are at most a few dozen living first language speakers. Further, when recorded audio exists, it is either not transcribed or inaccessible. Thus, a method to decrease the number of hours of data required would be very helpful. We are proposing a multifaceted approach that could reduce the data requirements drastically. The immediate focus of this project is to identify solutions for a specific set of Indigenous languages in North America, but we anticipate that the resulting system for rapid ASR development will solve similar problems for the thousands of languages used by other under-resourced/underserved speech communities around the world.
The technical lead of the FLAIR initiative is Michael Running Wolf, a PhD student in computer science at McGill University. The activities are also made possible thanks to the collaboration of multiple AI students and language culture consultants.
Michael Running Wolf (Northern Cheyenne/Lakota/Blackfeet) was raised in a rural prairie village in Montana with intermittent water and electricity; naturally, he has a Master’s of Science in Computer Science, is a former engineer for Amazon’s Alexa, former faculty at Northeastern University, and is pursuing a PhD in CS at McGill University. Michael is researching Indigenous language and culture reclamation using immersive technologies (AR/VR) and artificial intelligence. His work has been awarded an MIT Solve Fellowship, the Alfred P. Sloan Fellowship, and the Patrick McGovern AI for Humanity Prize. Through the ethical application of AI and advanced technology respecting Indigenous ways of knowing, he is contributing to the ecology of thought represented by the Indigenous.
For any inquiry about the project, please communicate with Benjamin Prud’homme, Executive Director of Mila’s AI for Humanity team, at firstname.lastname@example.org.