Portrait of David Ifeoluwa Adelani

David Ifeoluwa Adelani

Core Academic Member
Canada CIFAR AI Chair
McGill University
Research Topics
Deep Learning
Natural Language Processing
Representation Learning
Speech Processing

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Research Intern - McGill University
PhD - McGill University
Research Intern - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
Professional Master's - Université de Montréal
Research Intern - McGill University
Master's Research - McGill University

Publications

SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Shamsuddeen Hassan Muhammad
Nedjma OUSIDHOUM
Idris Abdulmumin
Seid Muhie Yimam
Jan Philip Wahle
Terry Lima Ruas
Meriem Beloucif
Christine de Kock
Tadesse Belay
Ibrahim Ahmad
Nirmal Surange
Daniela Teodorescu
Alham Fikri Aji
Felermino Ali
Vladimir Araujo
Abinew Ayele
Oana Ignat
Alexander Panchenko
Yi Zhou … (see 1 more)
Saif M. Mohammad
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Shamsuddeen Hassan Muhammad
Nedjma OUSIDHOUM
Idris Abdulmumin
Seid Muhie Yimam
Jan Philip Wahle
Terry Lima Ruas
Meriem Beloucif
Christine de Kock
Tadesse Belay
Ibrahim Ahmad
Nirmal Surange
Daniela Teodorescu
Alham Fikri Aji
Felermino Ali
Vladimir Araujo
Abinew Ayele
Oana Ignat
Alexander Panchenko
Yi Zhou … (see 1 more)
Saif M. Mohammad
Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation
Senyu Li
Zipeng Sun
Jiayi Wang
Pontus Stenetorp
INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
Hao Yu
Jesujoba Oluwadara Alabi
Andiswa Bukula
Zhuang Yun Jian
En-Shiun Annie Lee
Tadesse Kebede Guge
Israel Abebe Azime
Happy Buzaaba
Blessing Kudzaishe Sibanda
Godson Kalipe
Jonathan Mukiibi
S. Kabenamualu
M. Setaka
Lolwethu Ndolela
Nkiruka Bridget Odu
Rooweither Mabuya
Shamsuddeen Hassan Muhammad
Salomey Osei
Sokhar Samb
Juliet W. Murage … (see 2 more)
Dietrich Klakow
Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks o… (see more)ften exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.
INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
Hao Yu
Jesujoba Oluwadara Alabi
Andiswa Bukula
Zhuang Yun Jian
En-Shiun Annie Lee
Tadesse Kebede Guge
Israel Abebe Azime
Happy Buzaaba
Blessing Kudzaishe Sibanda
Godson Kalipe
Jonathan Mukiibi
S. Kabenamualu
M. Setaka
Lolwethu Ndolela
Nkiruka Bridget Odu
Rooweither Mabuya
Shamsuddeen Hassan Muhammad
Salomey Osei
Sokhar Samb
Juliet W. Murage … (see 2 more)
Dietrich Klakow
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad
Idris Abdulmumin
Abinew Ayele
Ibrahim Ahmad
Saminu Mohammad Aliyu
Nelson Odhiambo Onyango
Lilian D. A. Wanzare
Samuel Rutunda
Lukman Jibril Aliyu
Esubalew Alemneh
Oumaima Hourrane
Hagos Gebremichael
Elyas Abdi Ismail
Meriem Beloucif
Ebrahim Chekol Jibril
Andiswa Bukula
Rooweither Mabuya
Salomey Osei
Abigail Oppong … (see 7 more)
Tadesse Belay
Tadesse Kebede Guge
Tesfa Tegegne Asfaw
Chiamaka Ijeoma Chukwuneke
Paul Rottger
Seid Muhie Yimam
Nedjma OUSIDHOUM
Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and modera… (see more)ted. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad
Idris Abdulmumin
Abinew Ayele
Ibrahim Ahmad
Saminu Mohammad Aliyu
Nelson Odhiambo Onyango
Lilian D. A. Wanzare
Samuel Rutunda
Lukman Jibril Aliyu
Esubalew Alemneh
Oumaima Hourrane
Hagos Gebremichael
Elyas Abdi Ismail
Meriem Beloucif
Ebrahim Chekol Jibril
Andiswa Bukula
Rooweither Mabuya
Salomey Osei
Abigail Oppong … (see 7 more)
Tadesse Belay
Tadesse Kebede Guge
Tesfa Tegegne Asfaw
Chiamaka Ijeoma Chukwuneke
Paul Rottger
Seid Muhie Yimam
Nedjma OUSIDHOUM
Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and modera… (see more)ted. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
Dawei Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (see more)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
Dawei Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (see more)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
D. Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (see more)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AfriHG: News headline generation for African Languages
Toyib Ogunremi
Serah Akojenu
Anthony Soronnadi
Olubayo Adekanmbi
This paper introduces AfriHG -- a news headline generation dataset created by combining from XLSum and MasakhaNEWS datasets focusing on 16 l… (see more)anguages widely spoken by Africa. We experimented with two seq2eq models (mT5-base and AfriTeVa V2), and Aya-101 LLM. Our results show that Africa-centric seq2seq models such as AfriTeVa V2 outperform the massively multilingual mT5-base model. Finally, we show that the performance of fine-tuning AfriTeVa V2 with 313M parameters is competitive to prompting Aya-101 LLM with more than 13B parameters.
The Responsible Foundation Model Development Cheatsheet: A Review of Tools&Resources
Shayne Longpre
Stella Biderman
Alon Albalak
Hailey Schoelkopf
Daniel McDuff
Sayash Kapoor
Kevin Klyman
Kyle Lo
Gabriel Ilharco
Nay San
Maribeth Rauh
Aviya Skowron
Bertie Vidgen
Laura Weidinger
Arvind Narayanan
Victor Sanh
Percy Liang
Rishi Bommasani
Peter Henderson … (see 3 more)
Sasha Luccioni
Yacine Jernite
Luca Soldaini