Portrait of David Ifeoluwa Adelani

David Ifeoluwa Adelani

Core Academic Member
Canada CIFAR AI Chair
McGill University
Research Topics
Deep Learning
Natural Language Processing
Representation Learning
Speech Processing

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Research Intern - McGill University
PhD - McGill University
Research Intern - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
Research Intern - McGill University
Professional Master's - Université de Montréal
Research Intern - McGill University
Master's Research - McGill University

Publications

McGill NLP Group Submission to the MRL 2024 Shared Task: Ensembling Enhances Effectiveness of Multilingual Small LMs
We present our systems for the three tasks and five languages included in the MRL 2024 Shared Task on Multilingual Multi-task Information Re… (see more)trieval: (1) Named Entity Recognition, (2) Free-form Question Answering, and (3) Multiple-choice Question Answering. For each task, we explored the impact of selecting different multilingual language models for fine-tuning across various target languages, and implemented an ensemble system that generates final outputs based on predictions from multiple fine-tuned models. All models are large language models fine-tuned on task-specific data. Our experimental results show that a more balanced dataset would yield better results. However, when training data for certain languages are scarce, fine-tuning on a large amount of English data supplemented by a small amount of “triggering data” in the target language can produce decent results.
Mitigating Translationese in Low-resource Languages: The Storyboard Approach
Garry Kuwanto
Eno-Abasi Urua
Priscilla A. Amuok
Shamsuddeen Hassan Muhammad
Aremu Anuoluwapo
Verrah Akinyi Otiende
Loice Emma Nanyanga
T. Nyoike
A. D. Akpan
Nsima Ab Udouboh
Idongesit Udeme Archibong
Idara Effiong Moses
Ifeoluwatayo A. Ige
Benjamin A. Ajibade
Olumide Benjamin Awokoya
Idris Abdulmumin
Saminu Mohammad Aliyu
Ruqayya Nasir Iro
Ibrahim Ahmad
Deontae Smith … (see 4 more)
Praise-EL Michaels
Derry Tanti Wijaya
Anietie U Andy
Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which… (see more) can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
Hannah Liu
Xiaoyu Shen
Nikita Vassilyev
Jesujoba Oluwadara Alabi
Yanke Mao
Haonan Gao
Annie En-Shiun Lee
Voices Unheard: NLP Resources and Models for Yor\`ub\'a Regional Dialects
Orevaoghene Ahia
Aremu Anuoluwapo
Diana Abagyan
Hila Gonen
Daud Abolade
Noah A. Smith
Yulia Tsvetkov
Cross-lingual Open-Retrieval Question Answering for African Languages
Odunayo Ogundepo
Tajuddeen Gwadabe
Clara E. Rivera
Jonathan H. Clark
Sebastian Ruder
Bonaventure F. P. Dossou
Abdou Aziz DIOP
Claytone Sikasote
Gilles Q. Hacheme
Happy Buzaaba
Ignatius Majesty Ezeani
Rooweither Mabuya
Salomey Osei
Albert Njoroge Kahira
Shamsuddeen Hassan Muhammad
Akintunde Oladipo
Abraham Toluwase Owodunni
Atnafu Lambebo Tonja … (see 24 more)
Iyanuoluwa Shode
Akari Asai
Aremu Anuoluwapo
Ayodele Awokoya
Bernard Opoku
Chiamaka Ijeoma Chukwuneke
Christine Mwase
Clemencia Siro
Stephen Arthur
Tunde Oluwaseyi Ajayi
V. Otiende
Andre Niyongabo Rubungo
B. Sinkala
Daniel A. Ajisafe
Emeka Onwuegbuzia
Falalu Lawan
Ibrahim Ahmad
Jesujoba Alabi
CHINEDU EMMANUEL MBONU
Mofetoluwa Adeyemi
Mofya Phiri
Orevaoghene Ahia
Ruqayya Nasir Iro
Sonia Adhiambo
Cross-lingual Open-Retrieval Question Answering for African Languages
Odunayo Ogundepo
Tajuddeen Gwadabe
Clara E. Rivera
Jonathan H. Clark
Sebastian Ruder
Bonaventure F. P. Dossou
Abdou Aziz DIOP
Claytone Sikasote
Gilles HACHEME
Happy Buzaaba
Ignatius Ezeani
Rooweither Mabuya
Salomey Osei
Albert Kahira
Shamsuddeen Hassan Muhammad
Akintunde Oladipo
Abraham Toluwase Owodunni
Atnafu Lambebo Tonja … (see 32 more)
Iyanuoluwa Shode
Akari Asai
Aremu Anuoluwapo
Ayodele Awokoya
Bernard Opoku
Chiamaka Ijeoma Chukwuneke
Christine Mwase
Clemencia Siro
Stephen Arthur
Oyinkansola Awosan
Tunde Oluwaseyi Ajayi
Verrah Akinyi Otiende
Andre Niyongabo Rubungo
Boyd Sinkala
Daniel Ajisafe
Emeka Felix Onwuegbuzia
Falalu Lawan
Ibrahim Ahmad
Jesujoba Oluwadara Alabi
Habib Mbow
CHINEDU EMMANUEL MBONU
Emile Niyomutabazi
Mofetoluwa Adeyemi
Eunice Mukonde
Mofya Phiri
Orevaoghene Ahia
Ruqayya Nasir Iro
Sonia Adhiambo
Martin Namukombo
Neo Putini
Ndumiso Mngoma
Priscilla A. Amuok
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Sebastian Ruder
Jonathan H. Clark
Alexander Gutkin
Mihir Kale
Min Ma
Massimo Nicosia
Shruti Rijhwani
Parker Riley
Jean Michel Amath Sarr
Xinyi Wang
John Frederick Wieting
Nitish Gupta
Anna Katanova
Christo Kirov
Dana L Dickinson
Brian Roark
Bidisha Samanta
Connie Tao
Vera Axelrod … (see 7 more)
Isaac Rayburn Caswell
Colin Cherry
Dan Garrette
Reeve Ingle
Melvin Johnson
Dmitry Panteleev
Partha Talukdar
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- l… (see more)anguages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models
AfroBench: How Good are Large Language Models on African Languages?
Kelechi Ogueji
Pontus Stenetorp
AfroBench: How Good are Large Language Models on African Languages?
Kelechi Ogueji
Pontus Stenetorp
How good are Large Language Models on African Languages?
Kelechi Ogueji
Pontus Stenetorp
Better Quality Pre-training Data and T5 Models for African Languages
Akintunde Oladipo
Mofetoluwa Adeyemi
Orevaoghene Ahia
Abraham Toluwase Owodunni
Odunayo Ogundepo
Jimmy Lin
In this study, we highlight the importance of enhancing the quality of pretraining data in multilingual language models. Existing web crawl… (see more)s have demonstrated quality issues, particularly in the context of low-resource languages. Consequently, we introduce a new multilingual pretraining corpus for
Improving Language Plasticity via Pretraining with Active Forgetting
Yihong Chen
Kelly Marchisio
Roberta Raileanu
Pontus Stenetorp
Sebastian Riedel
Mikel Artetxe
Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performan… (see more)ce, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation, but also outperform standard ones in a low-data regime, particularly for languages that are distant from English. Code will be available at https://github.com/facebookresearch/language-model-plasticity.