Portrait de David Ifeoluwa Adelani

David Ifeoluwa Adelani

Membre académique principal
Chaire en IA Canada-CIFAR
McGill University
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Traitement de la parole
Traitement du langage naturel

Biographie

David Adelani est professeur adjoint en science informatique et lutte contre les inégalités à l’Université McGill, et membre académique principal à Mila – Institut québécois d'intelligence artificielle. Ses recherches se concentrent sur le traitement multilingue du langage naturel, avec un accent particulier sur les langues sous-dotées en ressources.

Étudiants actuels

Stagiaire de recherche - McGill
Doctorat - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Collaborateur·rice alumni - McGill
Maîtrise professionnelle - UdeM
Maîtrise recherche - McGill

Publications

Lugha-Llama: Adapting Large Language Models for African Languages
Happy Buzaaba
Alexander Wettig
Christiane Fellbaum
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Shamsuddeen Hassan Muhammad
Nedjma OUSIDHOUM
Idris Abdulmumin
Seid Muhie Yimam
Jan Philip Wahle
Terry Lima Ruas
Meriem Beloucif
Christine de Kock
Tadesse Belay
Ibrahim Ahmad
Nirmal Surange
Daniela Teodorescu
Alham Fikri Aji
Felermino Ali
Vladimir Araujo
Abinew Ayele
Oana Ignat
Alexander Panchenko
Yi Zhou … (voir 1 de plus)
Saif M. Mohammad
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Shamsuddeen Hassan Muhammad
Nedjma OUSIDHOUM
Idris Abdulmumin
Seid Muhie Yimam
Jan Philip Wahle
Terry Lima Ruas
Meriem Beloucif
Christine de Kock
Tadesse Belay
Ibrahim Ahmad
Nirmal Surange
Daniela Teodorescu
Alham Fikri Aji
Felermino Ali
Vladimir Araujo
Abinew Ayele
Oana Ignat
Alexander Panchenko
Yi Zhou … (voir 1 de plus)
Saif M. Mohammad
Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation
Senyu Li
Zipeng Sun
Jiayi Wang
Pontus Stenetorp
INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
Hao Yu
Jesujoba Oluwadara Alabi
Andiswa Bukula
Zhuang Yun Jian
En-Shiun Annie Lee
Tadesse Kebede Guge
Israel Abebe Azime
Happy Buzaaba
Blessing Kudzaishe Sibanda
Godson Kalipe
Jonathan Mukiibi
S. Kabenamualu
M. Setaka
Lolwethu Ndolela
Nkiruka Bridget Odu
Rooweither Mabuya
Shamsuddeen Hassan Muhammad
Salomey Osei
Sokhar Samb
Juliet W. Murage … (voir 2 de plus)
Dietrich Klakow
Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks o… (voir plus)ften exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.
INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
Hao Yu
Jesujoba Oluwadara Alabi
Andiswa Bukula
Zhuang Yun Jian
En-Shiun Annie Lee
Tadesse Kebede Guge
Israel Abebe Azime
Happy Buzaaba
Blessing Kudzaishe Sibanda
Godson Kalipe
Jonathan Mukiibi
S. Kabenamualu
M. Setaka
Lolwethu Ndolela
Nkiruka Bridget Odu
Rooweither Mabuya
Shamsuddeen Hassan Muhammad
Salomey Osei
Sokhar Samb
Juliet W. Murage … (voir 2 de plus)
Dietrich Klakow
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad
Idris Abdulmumin
Abinew Ayele
Ibrahim Ahmad
Saminu Mohammad Aliyu
Nelson Odhiambo Onyango
Lilian D. A. Wanzare
Samuel Rutunda
Lukman Jibril Aliyu
Esubalew Alemneh
Oumaima Hourrane
Hagos Gebremichael
Elyas Abdi Ismail
Meriem Beloucif
Ebrahim Chekol Jibril
Andiswa Bukula
Rooweither Mabuya
Salomey Osei
Abigail Oppong … (voir 7 de plus)
Tadesse Belay
Tadesse Kebede Guge
Tesfa Tegegne Asfaw
Chiamaka Ijeoma Chukwuneke
Paul Rottger
Seid Muhie Yimam
Nedjma OUSIDHOUM
Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and modera… (voir plus)ted. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad
Idris Abdulmumin
Abinew Ayele
Ibrahim Ahmad
Saminu Mohammad Aliyu
Nelson Odhiambo Onyango
Lilian D. A. Wanzare
Samuel Rutunda
Lukman Jibril Aliyu
Esubalew Alemneh
Oumaima Hourrane
Hagos Gebremichael
Elyas Abdi Ismail
Meriem Beloucif
Ebrahim Chekol Jibril
Andiswa Bukula
Rooweither Mabuya
Salomey Osei
Abigail Oppong … (voir 7 de plus)
Tadesse Belay
Tadesse Kebede Guge
Tesfa Tegegne Asfaw
Chiamaka Ijeoma Chukwuneke
Paul Rottger
Seid Muhie Yimam
Nedjma OUSIDHOUM
Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and modera… (voir plus)ted. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
Dawei Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (voir plus)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
D. Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (voir plus)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
Dawei Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (voir plus)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AfriHG: News headline generation for African Languages
Toyib Ogunremi
Serah Akojenu
Anthony Soronnadi
Olubayo Adekanmbi
This paper introduces AfriHG -- a news headline generation dataset created by combining from XLSum and MasakhaNEWS datasets focusing on 16 l… (voir plus)anguages widely spoken by Africa. We experimented with two seq2eq models (mT5-base and AfriTeVa V2), and Aya-101 LLM. Our results show that Africa-centric seq2seq models such as AfriTeVa V2 outperform the massively multilingual mT5-base model. Finally, we show that the performance of fine-tuning AfriTeVa V2 with 313M parameters is competitive to prompting Aya-101 LLM with more than 13B parameters.