David Ifeoluwa Adelani

Membre académique principal

Chaire en IA Canada-CIFAR

McGill University

Biographie

David Adelani est professeur adjoint à venir en science informatique et lutte contre les inégalités à l’Université McGill, et membre académique principal à Mila – Institut québécois d'intelligence artificielle. Ses recherches se concentrent sur le traitement multilingue du langage naturel, avec un accent particulier sur les langues sous-dotées en ressources.

Publications

Few-Shot Pidgin Text Adaptation via Contrastive Fine-Tuning

Ernie Chang

Jesujoba Oluwadara Alabi

David Ifeoluwa Adelani

Vera Demberg

The surging demand for multilingual dialogue systems often requires a costly labeling process for each language addition. For low resource l… (voir plus)anguages, human annotators are continuously tasked with the adaptation of resource-rich language utterances for each new domain. However, this prohibitive and impractical process can often be a bottleneck for low resource languages that are still without proper translation systems nor parallel corpus. In particular, it is difficult to obtain task-specific low resource language annotations for the English-derived creoles (e.g. Nigerian and Cameroonian Pidgin). To address this issue, we utilize the pretrained language models i.e. BART which has shown great potential in language generation/understanding – we propose to finetune the BART model to generate utterances in Pidgin by leveraging the proximity of the source and target languages, and utilizing positive and negative examples in constrastive training objectives. We collected and released the first parallel Pidgin-English conversation corpus in two dialogue domains and showed that this simple and effective technique is suffice to yield impressive results for English-to-Pidgin generation, which are two closely-related languages.

2022-01-01

COLING (publié)

dblp.uni-trier.de

Findings of the WMT’22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages

David Ifeoluwa Adelani

Md Mahfuz Ibn Alam

Antonios Anastasopoulos

Akshita Bhagia

Marta R. Costa-jussa

Jesse Dodge

Fahim Faisal

Christian Federmann

Natalia N. Fedorova

Francisco S. Guzm'an

Sergey Koshelev

Jean Maillard

Vukosi Marivate

Jonathan Mbuya

Alexandre Mourachko

Safiyyah Saleem

Holger Schwenk

Guillaume Wenzek

We present the results of the WMT’22 SharedTask on Large-Scale Machine Translation Evaluation for African Languages. The shared taskinclud… (voir plus)ed both a data and a systems track, alongwith additional innovations, such as a focus onAfrican languages and extensive human evaluation of submitted systems. We received 14system submissions from 8 teams, as well as6 data track contributions. We report a largeprogress in the quality of translation for Africanlanguages since the last iteration of this sharedtask: there is an increase of about 7.5 BLEUpoints across 72 language pairs, and the average BLEU scores went from 15.09 to 22.60.

2022-01-01

Conference on Machine Translation (publié)

dblp.uni-trier.de

Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages

Jesujoba Oluwadara Alabi

David Ifeoluwa Adelani

Marius Mosbach

Dietrich Klakow

and XLM-R) and three NLP tasks (NER, news topic classiﬁcation, and sentiment classiﬁcation) shows that our approach is competitive to ap… (voir plus)plying LAFT on individual languages while requiring signiﬁcantly less disk space. Finally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efﬁcient ﬁne-tuning methods.

2022-01-01

arXiv.org (prépublication)

doi.org

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Hugo Laurençon

Lucile Saulnier

Thomas Wang

Christopher Akiki

Albert Villanova del Moral

Teven Le Scao

Leandro Von Werra

Chenghao Mou

Eduardo González Ponferrada

Huu Nguyen

Jörg Frohberg

Mario Šaško

Quentin Lhoest

Angelina McMillan-Major

Gérard Dupont

Stella Biderman

Anna Rogers

Loubna Ben allal

Francesco De Toni

Giada Pistilli … (voir 34 de plus)

Olivier Nguyen

Somaieh Nikpoor

Maraim Masoud

Pierre Colombo

Javier de la Rosa

Paulo Villegas

Tristan Thrush

Shayne Longpre

Sebastian Nagel

Leon Weber

Manuel Romero Muñoz

Jian Zhu

Daniel Van Strien

Zaid Alyafeai

Khalid Almubarak

Vu Minh Chien

Itziar Gonzalez-Dios

Aitor Soroa

Kyle Lo

Manan Dey

Pedro Ortiz Suarez

Aaron Gokaslan

Shamik Bose

David Ifeoluwa Adelani

Long Phan

Hieu Tran

Ian Yu

Suhas Pai

Jenny Chim

Violette Lepercq

Suzana Ilic

Margaret Mitchell

Sasha Luccioni

Yacine Jernite

As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multili… (voir plus)ngual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of ethics, harm, and governance in the foreground. This paper documents the data creation and curation efforts undertaken by BigScience to assemble the Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter BigScience Large Open-science Open-access Multilingual (BLOOM) language model. We further release a large initial subset of the corpus and analyses thereof, and hope to empower large-scale monolingual and multilingual modeling projects with both the data and the processing tools, as well as stimulate research around this large multilingual corpus.

openreview.net

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

David Ifeoluwa Adelani

Biographie

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

David Ifeoluwa Adelani

Biographie

Publications