Portrait of David Ifeoluwa Adelani

David Ifeoluwa Adelani

Core Academic Member
Canada CIFAR AI Chair
McGill University
Research Topics
Deep Learning
Natural Language Processing
Representation Learning
Speech Processing

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Master's Research - McGill University
Master's Research - McGill University
Research Intern - McGill University
Collaborating researcher - McGill University
Postdoctorate - McGill University
Research Intern - McGill University
PhD - McGill University
Research Intern - McGill University
PhD - McGill University
PhD - McGill University
Research Intern - McGill University
Master's Research - McGill University
Research Intern - McGill University
Professional Master's - Université de Montréal
Research Intern - McGill University
Master's Research - McGill University

Publications

Multilinguality as Sense Adaptation
Jan Christian Blaise Cruz
Alham Fikri Aji
Afri-MCQA: Multimodal Cultural Question Answering for African Languages
Atnafu Lambebo Tonja
Srija Anand
Emilio Villa Cueva
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Muhidin A. Mohamed
Debela Desalegn Yadeta
Negasi Haile Abadi
Abigail Oppong
Nnaemeka Casmir Obiefuna
Idris Abdulmumin
Naome Etori
Eric Peter Wairagala
Kanda Patrick Tshinu
Imanigirimbabazi Emmanuel
Gabofetswe Malema
Alham Fikri Aji
Thamar Solorio
Africa is home to over one-third of the world's languages, yet remains underrepresented in AI research. We introduce Afri-MCQA, the first Mu… (see more)ltilingual Cultural Question-Answering benchmark covering 7.5k Q&A pairs across 15 African languages from 12 countries. The benchmark offers parallel English-African language Q&A pairs across text and speech modalities and was entirely created by native speakers. Benchmarking large language models (LLMs) on Afri-MCQA shows that open-weight models perform poorly across evaluated cultures, with near-zero accuracy on open-ended VQA when queried in native language or speech. To evaluate linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. To support more inclusive multimodal AI development in African languages, we release our Afri-MCQA under academic license or CC BY-NC 4.0 on HuggingFace (https://huggingface.co/datasets/Atnafu/Afri-MCQA)
AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages
Hao Yu
Tianyi Xu
Michael A. Hedderich
Wassim Hamidouche
Syed Waqas Zamir
Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigeria's Minority Languages
Oluwadara Kalejaiye
Luel Hagos Beyene
Mmekut-Mfon Gabriel Edet
A. D. Akpan
Eno-Abasi Urua
Anietie U Andy
Evaluating WMT 2025 Metrics Shared Task Submissions on the SSA-MTE African Challenge Set
Senyu Li
Felermino Dario Mario Ali
Jiayi Wang
Rui Sousa-Silva
Henrique Lopes Cardoso
Pontus Stenetorp
Colin Cherry
Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help
Alon Lavie
Greg Hanneman
Sweta Agrawal
Diptesh Kanojia
Chi-Kiu Lo
Vilém Zouhar
Frédéric Blain
Chrysoula Zerva
Eleftherios Avramidis
Sourabh Dattatray Deoghare
Archchana Sindhujan
Jiayi Wang
Brian Thompson
Tom Kocmi
Markus Freitag
Daniel Deutsch
DIVERS-Bench: Evaluating Language Identification Across Domain Shifts and Code-Switching
Language Identification (LID) is a core task in multilingual NLP, yet current systems often overfit to clean, monolingual data. This work in… (see more)troduces DIVERS-BENCH, a comprehensive evaluation of state-of-the-art LID models across diverse domains, including speech transcripts, web text, social media texts, children's stories, and code-switched text. Our findings reveal that while models achieve high accuracy on curated datasets, performance degrades sharply on noisy and informal inputs. We also introduce DIVERS-CS, a diverse code-switching benchmark dataset spanning 10 language pairs, and show that existing models struggle to detect multiple languages within the same sentence. These results highlight the need for more robust and inclusive LID systems in real-world settings.
DIVERS-Bench: Evaluating Language Identification Across Domain Shifts and Code-Switching
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding
Fabian David Schmidt
Goran Glavaš
Spoken language understanding (SLU) is indispensable for half of all living languages that lack a formal writing system, since these languag… (see more)es cannot pair automatic speech recognition (ASR) with language models to benefit from language technology. Even if low-resource languages possess a writing system, ASR for these languages remains unreliable due to limited bimodal speech and text training data. Better SLU can strengthen the robustness of massively multilingual ASR by levering language semantics to disambiguate utterances via context or exploiting semantic similarities across languages. However, the evaluation of multilingual SLU remains limited to shallow tasks such as intent classification or language identification. To address this, we present Fleurs-SLU, a multilingual SLU benchmark that encompasses (i) 692 hours of speech for topical utterance classification in 102 languages and (ii) multiple-choice question answering through listening comprehension spanning 944 hours of speech across 92 languages. We extensively evaluate both end-to-end speech classification models and cascaded systems that combine speech-to-text transcription with subsequent classification by large language models on Fleurs-SLU. Our results show that cascaded systems exhibit greater robustness in multilingual SLU tasks, though speech encoders can achieve competitive performance in topical speech classification when appropriately pre-trained. We further find a strong correlation between robust multilingual ASR, effective speech-to-text translation, and strong multilingual SLU, highlighting the mutual benefits between acoustic and semantic speech representations.
AfroBench: How Good are Large Language Models on African Languages?
Kelechi Ogueji
Pontus Stenetorp
Natural language processing for African languages
Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performanc… (see more)e. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in NLP research, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
Luel Hagos Beyene
Min Ma
Jesujoba Oluwadara Alabi
Fabian David Schmidt
Joyce Nakatumba-Nabende
Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as spe… (see more)ech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.