Portrait of David Ifeoluwa Adelani

David Ifeoluwa Adelani

Core Academic Member
Canada CIFAR AI Chair
McGill University
Research Topics
Deep Learning
Natural Language Processing
Representation Learning
Speech Processing

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Master's Research - McGill University
Research Intern - McGill University
Research Intern - McGill University
Research Intern - McGill University
Research Intern - McGill University
PhD - McGill University
Research Intern - McGill University
Research Intern - McGill University
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
Research Intern - McGill University
Professional Master's - Université de Montréal
Research Intern - McGill University
Master's Research - McGill University

Publications

Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding
Fabian David Schmidt
Ivan Vulić
Goran Glavaš
Spoken language understanding (SLU) is indispensable for half of all living languages that lack a formal writing system, since these languag… (see more)es cannot pair automatic speech recognition (ASR) with language models to benefit from language technology. Even if low-resource languages possess a writing system, ASR for these languages remains unreliable due to limited bimodal speech and text training data. Better SLU can strengthen the robustness of massively multilingual ASR by levering language semantics to disambiguate utterances via context or exploiting semantic similarities across languages. However, the evaluation of multilingual SLU remains limited to shallow tasks such as intent classification or language identification. To address this, we present Fleurs-SLU, a multilingual SLU benchmark that encompasses (i) 692 hours of speech for topical utterance classification in 102 languages and (ii) multiple-choice question answering through listening comprehension spanning 944 hours of speech across 92 languages. We extensively evaluate both end-to-end speech classification models and cascaded systems that combine speech-to-text transcription with subsequent classification by large language models on Fleurs-SLU. Our results show that cascaded systems exhibit greater robustness in multilingual SLU tasks, though speech encoders can achieve competitive performance in topical speech classification when appropriately pre-trained. We further find a strong correlation between robust multilingual ASR, effective speech-to-text translation, and strong multilingual SLU, highlighting the mutual benefits between acoustic and semantic speech representations.
AfroBench: How Good are Large Language Models on African Languages?
Kelechi Ogueji
Pontus Stenetorp
Natural language processing for African languages
Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performanc… (see more)e. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in NLP research, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
Luel Hagos Beyene
Min Ma
Jesujoba Oluwadara Alabi
Fabian David Schmidt
Joyce Nakatumba-Nabende
Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as spe… (see more)ech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
Luel Hagos Beyene
Min Ma
Jesujoba Oluwadara Alabi
Fabian David Schmidt
Joyce Nakatumba-Nabende
Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as spe… (see more)ech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.
SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?
Senyu Li
Jiayi Wang
Felermino Dario Mario Ali
Colin Cherry
Daniel Deutsch
Eleftheria Briakou
Rui Sousa-Silva
Henrique Lopes Cardoso
Pontus Stenetorp
Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often… (see more) suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 13 African language pairs from the News domain, with over 63,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using state-of-the-art LLMs like GPT-4o and Claude. Our experimental results show that SSA-COMET models significantly outperform AfriCOMET and are competitive with the strongest LLM (Gemini 2.5 Pro) evaluated in our study, particularly on low-resource languages such as Twi, Luo, and Yoruba. All resources are released under open licenses to support future research.
SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?
Senyu Li
Jiayi Wang
Felermino Dario Mario Ali
Colin Cherry
Daniel Deutsch
Eleftheria Briakou
Rui Sousa-Silva
Henrique Lopes Cardoso
Pontus Stenetorp
Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often… (see more) suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 13 African language pairs from the News domain, with over 63,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using state-of-the-art LLMs like GPT-4o and Claude. Our experimental results show that SSA-COMET models significantly outperform AfriCOMET and are competitive with the strongest LLM (Gemini 2.5 Pro) evaluated in our study, particularly on low-resource languages such as Twi, Luo, and Yoruba. All resources are released under open licenses to support future research.
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead
Jesujoba Oluwadara Alabi
Michael A. Hedderich
Dietrich Klakow
The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages
The NaijaVoices Community
Busayo Awobade
Abraham Owodunni
Handel Emezue
Gloria Monica Tobechukwu Emezue
N. N. Emezue
Sewade Ogun
Bunmi Akinremi
The development of high-performing, robust, and reliable speech technologies depends on large, high-quality datasets. However, African langu… (see more)ages -- including our focus, Igbo, Hausa, and Yoruba -- remain under-represented due to insufficient data. Popular voice-enabled technologies do not support any of the 2000+ African languages, limiting accessibility for circa one billion people. While previous dataset efforts exist for the target languages, they lack the scale and diversity needed for robust speech models. To bridge this gap, we introduce the NaijaVoices dataset, a 1,800-hour speech-text dataset with 5,000+ speakers. We outline our unique data collection approach, analyze its acoustic diversity, and demonstrate its impact through finetuning experiments on automatic speech recognition, averagely achieving 75.86% (Whisper), 52.06% (MMS), and 42.33% (XLSR) WER improvements. These results highlight NaijaVoices' potential to advance multilingual speech processing for African languages.
Improving Multilingual Math Reasoning for African Languages
Odunayo Ogundepo
Akintunde Oladipo
Kelechi Ogueji
Esther Adenuga
Jimmy Lin
Researchers working on low-resource languages face persistent challenges due to limited data availability and restricted access to computati… (see more)onal resources. Although most large language models (LLMs) are predominantly trained in high-resource languages, adapting them to low-resource contexts, particularly African languages, requires specialized techniques. Several strategies have emerged for adapting models to low-resource languages in todays LLM landscape, defined by multi-stage pre-training and post-training paradigms. However, the most effective approaches remain uncertain. This work systematically investigates which adaptation strategies yield the best performance when extending existing LLMs to African languages. We conduct extensive experiments and ablation studies to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations. Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.
The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages
The NaijaVoices Community
Busayo Awobade
Abraham Owodunni
Handel Emezue
Gloria Monica Tobechukwu Emezue
N. N. Emezue
Sewade Ogun
Bunmi Akinremi
Lugha-Llama: Adapting Large Language Models for African Languages
Happy Buzaaba
Alexander Wettig
Christiane Fellbaum