Portrait de David Ifeoluwa Adelani

David Ifeoluwa Adelani

Membre académique principal
Chaire en IA Canada-CIFAR
McGill University
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Traitement de la parole
Traitement du langage naturel

Biographie

David Adelani est professeur adjoint en science informatique et lutte contre les inégalités à l’Université McGill, et membre académique principal à Mila – Institut québécois d'intelligence artificielle. Ses recherches se concentrent sur le traitement multilingue du langage naturel, avec un accent particulier sur les langues sous-dotées en ressources.

Étudiants actuels

Maîtrise recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - McGill
Collaborateur·rice de recherche - McGill
Postdoctorat - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Doctorat - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - McGill
Maîtrise professionnelle - UdeM
Stagiaire de recherche - McGill
Maîtrise recherche - McGill

Publications

Sudanese-Flores: Extending FLORES+ to Sudanese Arabic Dialect
Hadia Mohmmedosman Ahmed Samil
In this work, we introduce Sudanese-Flores, an extension of the popular Flores+ machine translation (MT) benchmark to the Sudanese Arabic di… (voir plus)alect. We translate both the DEV and DEVTEST splits of the Modern Standard Arabic dataset into the corresponding Sudanese dialect, resulting in a total of 2,009 sentences. While the dialect was recently introduced in Google Translate, there are no available benchmark in this dialect despite spoken by over 40 million people. Our evaluation on two leading LLMs such as GPT-4.1 and Gemini 2.5 Flash showed that while the performance English to Arabic is impressive (more than 23 BLEU), they struggle on Sudanese dialect (less than 11 BLEU) in zero-shot settings. In few-shot scenario, we achieved only a slight boost in performance.
Multilinguality as Sense Adaptation
Jan Christian Blaise Cruz
Alham Fikri Aji
Afri-MCQA: Multimodal Cultural Question Answering for African Languages
Atnafu Lambebo Tonja
Srija Anand
Emilio Villa Cueva
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Muhidin A. Mohamed
Debela Desalegn Yadeta
Negasi Haile Abadi
Abigail Oppong
Nnaemeka Casmir Obiefuna
Idris Abdulmumin
Naome Etori
Eric Peter Wairagala
Kanda Patrick Tshinu
Imanigirimbabazi Emmanuel
Gabofetswe Malema
Alham Fikri Aji
Thamar Solorio
Africa is home to over one-third of the world's languages, yet remains underrepresented in AI research. We introduce Afri-MCQA, the first Mu… (voir plus)ltilingual Cultural Question-Answering benchmark covering 7.5k Q&A pairs across 15 African languages from 12 countries. The benchmark offers parallel English-African language Q&A pairs across text and speech modalities and was entirely created by native speakers. Benchmarking large language models (LLMs) on Afri-MCQA shows that open-weight models perform poorly across evaluated cultures, with near-zero accuracy on open-ended VQA when queried in native language or speech. To evaluate linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. To support more inclusive multimodal AI development in African languages, we release our Afri-MCQA under academic license or CC BY-NC 4.0 on HuggingFace (https://huggingface.co/datasets/Atnafu/Afri-MCQA)
AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages
Hao Yu
Tianyi Xu
Michael A. Hedderich
Wassim Hamidouche
Syed Waqas Zamir
Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigeria's Minority Languages
Oluwadara Kalejaiye
Luel Hagos Beyene
Mmekut-Mfon Gabriel Edet
A. D. Akpan
Eno-Abasi Urua
Anietie U Andy
Evaluating WMT 2025 Metrics Shared Task Submissions on the SSA-MTE African Challenge Set
Senyu Li
Felermino Dario Mario Ali
Jiayi Wang
Rui Sousa-Silva
Henrique Lopes Cardoso
Pontus Stenetorp
Colin Cherry
Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help
Alon Lavie
Greg Hanneman
Sweta Agrawal
Diptesh Kanojia
Chi-kiu Lo
Vilém Zouhar
Frédéric Blain
Chrysoula Zerva
Eleftherios Avramidis
Sourabh Dattatray Deoghare
Archchana Sindhujan
Jiayi Wang
Brian Thompson
Tom Kocmi
Markus Freitag
Daniel Deutsch
AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages
DIVERS-Bench: Evaluating Language Identification Across Domain Shifts and Code-Switching
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding
Fabian David Schmidt
Goran Glavaš
Spoken language understanding (SLU) is indispensable for half of all living languages that lack a formal writing system, since these languag… (voir plus)es cannot pair automatic speech recognition (ASR) with language models to benefit from language technology. Even if low-resource languages possess a writing system, ASR for these languages remains unreliable due to limited bimodal speech and text training data. Better SLU can strengthen the robustness of massively multilingual ASR by levering language semantics to disambiguate utterances via context or exploiting semantic similarities across languages. However, the evaluation of multilingual SLU remains limited to shallow tasks such as intent classification or language identification. To address this, we present Fleurs-SLU, a multilingual SLU benchmark that encompasses (i) 692 hours of speech for topical utterance classification in 102 languages and (ii) multiple-choice question answering through listening comprehension spanning 944 hours of speech across 92 languages. We extensively evaluate both end-to-end speech classification models and cascaded systems that combine speech-to-text transcription with subsequent classification by large language models on Fleurs-SLU. Our results show that cascaded systems exhibit greater robustness in multilingual SLU tasks, though speech encoders can achieve competitive performance in topical speech classification when appropriately pre-trained. We further find a strong correlation between robust multilingual ASR, effective speech-to-text translation, and strong multilingual SLU, highlighting the mutual benefits between acoustic and semantic speech representations.
AfroBench: How Good are Large Language Models on African Languages?
Kelechi Ogueji
Pontus Stenetorp
Natural language processing for African languages
Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performanc… (voir plus)e. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in NLP research, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.