Portrait of David Ifeoluwa Adelani

David Ifeoluwa Adelani

Core Academic Member
Canada CIFAR AI Chair
McGill University
Research Topics
Deep Learning
Natural Language Processing
Representation Learning
Speech Processing

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Master's Research - McGill University
Master's Research - McGill University
Research Intern - McGill University
Collaborating researcher - McGill University
Postdoctorate - McGill University
Research Intern - McGill University
PhD - McGill University
Research Intern - McGill University
PhD - McGill University
PhD - McGill University
Research Intern - McGill University
Master's Research - McGill University
Research Intern - McGill University
Professional Master's - Université de Montréal
Research Intern - McGill University
Master's Research - McGill University

Publications

Sudanese-Flores: Extending FLORES+ to Sudanese Arabic Dialect
Hadia Mohmmedosman Ahmed Samil
In this work, we introduce Sudanese-Flores, an extension of the popular Flores+ machine translation (MT) benchmark to the Sudanese Arabic di… (see more)alect. We translate both the DEV and DEVTEST splits of the Modern Standard Arabic dataset into the corresponding Sudanese dialect, resulting in a total of 2,009 sentences. While the dialect was recently introduced in Google Translate, there are no available benchmark in this dialect despite spoken by over 40 million people. Our evaluation on two leading LLMs such as GPT-4.1 and Gemini 2.5 Flash showed that while the performance English to Arabic is impressive (more than 23 BLEU), they struggle on Sudanese dialect (less than 11 BLEU) in zero-shot settings. In few-shot scenario, we achieved only a slight boost in performance.
Multilinguality as Sense Adaptation
Jan Christian Blaise Cruz
Alham Fikri Aji
Afri-MCQA: Multimodal Cultural Question Answering for African Languages
Atnafu Lambebo Tonja
Srija Anand
Emilio Villa Cueva
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Muhidin A. Mohamed
Debela Desalegn Yadeta
Negasi Haile Abadi
Abigail Oppong
Nnaemeka Casmir Obiefuna
Idris Abdulmumin
Naome Etori
Eric Peter Wairagala
Kanda Patrick Tshinu
Imanigirimbabazi Emmanuel
Gabofetswe Malema
Alham Fikri Aji
Thamar Solorio
Africa is home to over one-third of the world's languages, yet remains underrepresented in AI research. We introduce Afri-MCQA, the first Mu… (see more)ltilingual Cultural Question-Answering benchmark covering 7.5k Q&A pairs across 15 African languages from 12 countries. The benchmark offers parallel English-African language Q&A pairs across text and speech modalities and was entirely created by native speakers. Benchmarking large language models (LLMs) on Afri-MCQA shows that open-weight models perform poorly across evaluated cultures, with near-zero accuracy on open-ended VQA when queried in native language or speech. To evaluate linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. To support more inclusive multimodal AI development in African languages, we release our Afri-MCQA under academic license or CC BY-NC 4.0 on HuggingFace (https://huggingface.co/datasets/Atnafu/Afri-MCQA)
AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages
Hao Yu
Tianyi Xu
Michael A. Hedderich
Wassim Hamidouche
Syed Waqas Zamir
Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigeria's Minority Languages
Oluwadara Kalejaiye
Luel Hagos Beyene
Mmekut-Mfon Gabriel Edet
A. D. Akpan
Eno-Abasi Urua
Anietie U Andy
Evaluating WMT 2025 Metrics Shared Task Submissions on the SSA-MTE African Challenge Set
Senyu Li
Felermino Dario Mario Ali
Jiayi Wang
Rui Sousa-Silva
Henrique Lopes Cardoso
Pontus Stenetorp
Colin Cherry
Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help
Alon Lavie
Greg Hanneman
Sweta Agrawal
Diptesh Kanojia
Chi-kiu Lo
Vilém Zouhar
Frédéric Blain
Chrysoula Zerva
Eleftherios Avramidis
Sourabh Dattatray Deoghare
Archchana Sindhujan
Jiayi Wang
Brian Thompson
Tom Kocmi
Markus Freitag
Daniel Deutsch
AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages
DIVERS-Bench: Evaluating Language Identification Across Domain Shifts and Code-Switching
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding
Fabian David Schmidt
Goran Glavaš
Spoken language understanding (SLU) is indispensable for half of all living languages that lack a formal writing system, since these languag… (see more)es cannot pair automatic speech recognition (ASR) with language models to benefit from language technology. Even if low-resource languages possess a writing system, ASR for these languages remains unreliable due to limited bimodal speech and text training data. Better SLU can strengthen the robustness of massively multilingual ASR by levering language semantics to disambiguate utterances via context or exploiting semantic similarities across languages. However, the evaluation of multilingual SLU remains limited to shallow tasks such as intent classification or language identification. To address this, we present Fleurs-SLU, a multilingual SLU benchmark that encompasses (i) 692 hours of speech for topical utterance classification in 102 languages and (ii) multiple-choice question answering through listening comprehension spanning 944 hours of speech across 92 languages. We extensively evaluate both end-to-end speech classification models and cascaded systems that combine speech-to-text transcription with subsequent classification by large language models on Fleurs-SLU. Our results show that cascaded systems exhibit greater robustness in multilingual SLU tasks, though speech encoders can achieve competitive performance in topical speech classification when appropriately pre-trained. We further find a strong correlation between robust multilingual ASR, effective speech-to-text translation, and strong multilingual SLU, highlighting the mutual benefits between acoustic and semantic speech representations.
AfroBench: How Good are Large Language Models on African Languages?
Kelechi Ogueji
Pontus Stenetorp
Natural language processing for African languages
Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performanc… (see more)e. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in NLP research, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.