First Languages AI Reality

The First Languages AI Reality (FLAIR) initiative enables the next chapter in Indigenous language revitalization with artificial intelligence (AI) and immersive technology.

Background

More than 50% of the world’s languages will become extinct or seriously endangered by 2100. The extinction of a language results in the irrecoverable loss of unique cultural, historical, and ecological knowledge. Since each language is a unique expression of the human experience of the world, the knowledge it encodes may be the key to answering fundamental questions of the future.

The majority of the languages that are under threat are Indigenous languages. More than 4,000 languages are spoken by Indigenous peoples worldwide, who represent less than 6% of the global population. It is estimated that one Indigenous language dies every two weeks. Languages are central to the identity of Indigenous peoples, the preservation of their cultures, worldviews and visions and an expression of self-determination.

Vision

We envision a world in which Indigenous communities have full self-determination and sovereignty over their language and culture. We imagine technology serving the revitalization and thriving of languages as a tool to connect community members, to celebrate their identity, and to transmit their culture and knowledge on the community’s own terms.

Project Description

The FLAIR initiative serves Indigenous communities in their efforts to revitalise their language and culture through technology.

We are building the foundations for Indigenous Voice AI in systems explicitly designed to respect data sovereignty and linguistic self-determination. Our foundational automatic speech recognition (ASR) research aims to develop a method for the rapid creation of custom models for endangered languages. These models can be used for language learning, audio transcription, voice-controlled technology and much more. Furthermore, Voice AI will position Indigenous communities to participate in the Metaverse in their heritage languages and facilitate intergenerational language transmission, a critical factor for language vitality. It will enable inclusive immersive experiences in which Indigenous youth can reconnect with their heritage in culturally meaningful exchanges and activities.

We are proposing a multifaceted approach that aims to reduce the data requirements drastically. The development of ASR for a new language typically requires hundreds of hours of data. For most Indigenous languages, this is usually infeasible due to the limited or non-existent audio recordings and because there are very few remaining speakers. In many cases, there are barely a dozen or no native speakers left (who tend to be of advanced age) and large data collection from these speakers is not realistic. When recorded audio exists, it is either not transcribed or inaccessible. Thus, a method to decrease the number of hours of audio data required is critical to unlocking the potential of AI for low-resource languages.

The immediate focus of FLAIR is to validate solutions for a specific set of Indigenous languages in North America. All our learning and the tools we build will be shared publicly and open-source for free use. We will next scale up to deliver the resulting system for rapid ASR development to Indigenous communities worldwide as it could help solve similar problems for the thousands of languages used by other under-resourced/underserved communities.

Why first languages AI can be a reality

Watch FLAIR’s Technical Director, Michael Running Wolf, present his vision for the project at a TEDx event held in Boston.

Resources

Atlas of the World's Languages in Danger

The interactive version of the UNESCO Atlas of the World’s Languages in Danger lists approximately 2,500 languages threatened with extinction.

Explore https://unesdoc.unesco.org/ark:/48223/pf0000187026

Could AI help save Indigenous languages?

After growing up with the sounds of Cheyenne, software engineer Michael Running Wolf is now harnessing AI to ensure this language's survival, and others.

Read the article https://blog.dropbox.com/topics/work-culture/could-ai-help-save-indigenous-lang…

The United Nations Permanent Forum on Indigenous Issues

Indigenous languages are not only methods of communication, but also extensive and complex systems of knowledge that have developed over millennia.

Read the article https://www.un.org/development/desa/indigenouspeoples/wp-content/uploads/sites/…

In the Media

New technology for Indigenous languages (ITC News)

Watch the video https://www.youtube.com/watch?v=CBesejLv3Bw&t=1087s

Meet the Team

The FLAIR Initiative was founded and is led by Indigenous technologists. We have grown into a coalition of technologists, language reclamation and documentation leaders, machine learning scientists, endangered language teachers and researchers, and computational linguists. FLAIR activities are also made possible thanks to the contributions of Indigenous AI students and language culture consultants.

Michael Running Wolf

FLAIR Co-Founder & Lead Architect

Michael Running Wolf is a citizen of the Northern Cheyenne with Lakota and Blackfeet family ties. He was formerly an engineer for Amazon’s Alexa and a Computer Science instructor at Northeastern University. Michael is pursuing his PhD at McGill University, on sovereign AI systems for Indigenous languages.

Caroline Running Wolf

FLAIR Co-Founder & Lead Community Relations

Caroline Running Wolf is a citizen of the Crow Nation. She is pursuing her PhD at the University of British Columbia. She researches immersive technologies (AR/VR/XR) and AI to enhance Indigenous language and culture reclamation.

Shawn Tsosie

FLAIR Co-Founder & Lead Machine Learning Scientist

Dr. Shawn Tsosie (Navajo / Little Shell) is a machine learning scientist and Iraq War veteran. He holds degrees from MIT (undergraduate) and University of California, Santa Cruz (Masters/PhD) and leads the modeling work for automatic speech recognition of Indigenous languages.

Conor Quinn

FLAIR Co-Founder & Lead Linguist

Dr. Conor Quinn is a revitalization/reclamation linguist who has worked since the mid-1990s on the structure of Indigenous and polysynthetic languages. His research examines how scholarly models enable practical approaches to community language revitalization/reclamation pedagogy. He leads the development of Minimal Courses that inform data collection and serves as a pedagogical tool.

Other Members

Gigi Davidson

Shankhalika Srikanth

David Huggins-Haines

Farhan Shaikh

Josée Poirier (Mila)

Faith Baca (Indigenous Student)

Ryan Conti (Indigenous Student)

Daniela Ramos Ojeda (Indigenous Student)

Belu Ticona (Indigenous Student)

Dave Malenfant (Alumni)

Kyran Romero (Alumni)

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

First Languages AI Reality

Background

Vision

Project Description

Why first languages AI can be a reality

Resources

In the Media

Meet the Team

Partners

Have questions about the project?

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

First Languages AI Reality

Background

Vision

Project Description

Why first languages AI can be a reality

Resources

In the Media

Meet the Team

Partners

Have questions about the project?