Portrait of David Ifeoluwa Adelani

David Ifeoluwa Adelani

Core Academic Member
Canada CIFAR AI Chair
McGill University
Research Topics
Deep Learning
Natural Language Processing
Representation Learning
Speech Processing

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

Research Intern - McGill University
PhD - McGill University
Research Intern - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
Professional Master's - Université de Montréal
Research Intern - McGill University
Master's Research - McGill University

Publications

AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
Dawei Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (see more)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
Dawei Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (see more)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding
Fabian David Schmidt
Ivan Vuli'c
Goran Glavavs
AfriHG: News headline generation for African Languages
Toyib Ogunremi
Serah Akojenu
Anthony Soronnadi
Olubayo Adekanmbi
This paper introduces AfriHG -- a news headline generation dataset created by combining from XLSum and MasakhaNEWS datasets focusing on 16 l… (see more)anguages widely spoken by Africa. We experimented with two seq2eq models (mT5-base and AfriTeVa V2), and Aya-101 LLM. Our results show that Africa-centric seq2seq models such as AfriTeVa V2 outperform the massively multilingual mT5-base model. Finally, we show that the performance of fine-tuning AfriTeVa V2 with 313M parameters is competitive to prompting Aya-101 LLM with more than 13B parameters.
The Responsible Foundation Model Development Cheatsheet: A Review of Tools&Resources
Shayne Longpre
Stella Biderman
Alon Albalak
Hailey Schoelkopf
Daniel McDuff
Sayash Kapoor
Kevin Klyman
Kyle Lo
Gabriel Ilharco
Nay San
Maribeth Rauh
Aviya Skowron
Bertie Vidgen
Laura Weidinger
Arvind Narayanan
Victor Sanh
Percy Liang
Rishi Bommasani
Peter Henderson … (see 3 more)
Sasha Luccioni
Yacine Jernite
Luca Soldaini
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Shivalika Singh
Angelika Romanou
Cl'ementine Fourrier
Jian Gang Ngui
Daniel Vila-Suero
Peerat Limkonchotiwat
Kelly Marchisio
Wei Qi Leong
Yosephine Susanto
Raymond Ng
Shayne Longpre
Wei-Yin Ko
Madeline Smith
Antoine Bosselut
Alice Oh
André F. T. Martins
Leshem Choshen
Daphne Ippolito
Enzo Ferrante … (see 3 more)
Marzieh Fadaee
Beyza Ermis
Sara Hooker
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
Elizabeth Proehl
Kai Chen
Imaan Khadir
Naome Etori
Shamsuddeen Hassan Muhammad
Choice Mpanza
Igneciah Pocia Thete
Dietrich Klakow
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual accuracy often focus on high-resource languages primari… (see more)ly because datasets for low-resource languages (LRLs) are scarce. In this paper, we present Uhura -- a new benchmark that focuses on two tasks in six typologically-diverse African languages, created via human translation of existing English benchmarks. The first dataset, Uhura-ARC-Easy, is composed of multiple-choice science questions. The second, Uhura-TruthfulQA, is a safety benchmark testing the truthfulness of models on topics including health, law, finance, and politics. We highlight the challenges creating benchmarks with highly technical content for LRLs and outline mitigation strategies. Our evaluation reveals a significant performance gap between proprietary models such as GPT-4o and o1-preview, and Claude models, and open-source models like Meta's LLaMA and Google's Gemma. Additionally, all models perform better in English than in African languages. These results indicate that LMs struggle with answering scientific questions and are more prone to generating false claims in low-resource African languages. Our findings underscore the necessity for continuous improvement of multilingual LM capabilities in LRL settings to ensure safe and reliable use in real-world contexts. We open-source the Uhura Benchmark and Uhura Platform to foster further research and development in NLP for LRLs.
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
Elizabeth Proehl
Kai Chen
Imaan Khadir
Naome Etori
Shamsuddeen Hassan Muhammad
C. Mpanza
Igneciah Pocia Thete
Dietrich Klakow
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual accuracy often focus on high-resource languages primari… (see more)ly because datasets for low-resource languages (LRLs) are scarce. In this paper, we present Uhura -- a new benchmark that focuses on two tasks in six typologically-diverse African languages, created via human translation of existing English benchmarks. The first dataset, Uhura-ARC-Easy, is composed of multiple-choice science questions. The second, Uhura-TruthfulQA, is a safety benchmark testing the truthfulness of models on topics including health, law, finance, and politics. We highlight the challenges creating benchmarks with highly technical content for LRLs and outline mitigation strategies. Our evaluation reveals a significant performance gap between proprietary models such as GPT-4o and o1-preview, and Claude models, and open-source models like Meta's LLaMA and Google's Gemma. Additionally, all models perform better in English than in African languages. These results indicate that LMs struggle with answering scientific questions and are more prone to generating false claims in low-resource African languages. Our findings underscore the necessity for continuous improvement of multilingual LM capabilities in LRL settings to ensure safe and reliable use in real-world contexts. We open-source the Uhura Benchmark and Uhura Platform to foster further research and development in NLP for LRLs.
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
Elizabeth Proehl
Kai Chen
Imaan Khadir
Naome Etori
Shamsuddeen Hassan Muhammad
Choice Mpanza
Igneciah Pocia Thete
Dietrich Klakow
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual accuracy often focus on high-resource languages primari… (see more)ly because datasets for low-resource languages (LRLs) are scarce. In this paper, we present Uhura -- a new benchmark that focuses on two tasks in six typologically-diverse African languages, created via human translation of existing English benchmarks. The first dataset, Uhura-ARC-Easy, is composed of multiple-choice science questions. The second, Uhura-TruthfulQA, is a safety benchmark testing the truthfulness of models on topics including health, law, finance, and politics. We highlight the challenges creating benchmarks with highly technical content for LRLs and outline mitigation strategies. Our evaluation reveals a significant performance gap between proprietary models such as GPT-4o and o1-preview, and Claude models, and open-source models like Meta's LLaMA and Google's Gemma. Additionally, all models perform better in English than in African languages. These results indicate that LMs struggle with answering scientific questions and are more prone to generating false claims in low-resource African languages. Our findings underscore the necessity for continuous improvement of multilingual LM capabilities in LRL settings to ensure safe and reliable use in real-world contexts. We open-source the Uhura Benchmark and Uhura Platform to foster further research and development in NLP for LRLs.
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata
Frederikus Hudi
Patrick Amadeus Irawan
David Anugraha
Rifki Afina Putri
Yutong Wang
Adam Nohejl
Ubaidillah Ariq Prathama
Nedjma OUSIDHOUM
Afifa Amriani
Anar Rzayev
Anirban Das
Ashmari Pramodya
Aulia Adila
Bryan Wilie
Candy Olivia Mawalim
Ching Lam Cheng
Daud Abolade
Emmanuele Chersoni
Enrico Santus … (see 31 more)
Fariz Ikhwantri
Garry Kuwanto
Hanyang Zhao
Haryo Akbarianto Wibowo
Holy Lovenia
Jan Christian Blaise Cruz
Jan Wira Gotama Putra
Junho Myung
Lucky Susanto
Maria Angelica Riera Machin
Marina Zhukova
Michael Anugraha
Muhammad Farid Adilazuarda
Natasha Santosa
Peerat Limkonchotiwat
Raj Dabre
Rio Alexander Audino
Samuel Cahyawijaya
Shi-Xiong Zhang
Stephanie Yulia Salim
Yi Zhou
Yinxuan Gui
En-Shiun Annie Lee
Shogo Okada
Ayu Purwarianti
Alham Fikri Aji
Taro Watanabe
Derry Tanti Wijaya
Alice Oh
Chong-Wah Ngo
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata
Frederikus Hudi
Patrick Amadeus Irawan
David Anugraha
Rifki Afina Putri
Yutong Wang
Adam Nohejl
Ubaidillah Ariq Prathama
Nedjma OUSIDHOUM
Afifa Amriani
Anar Rzayev
Anirban Das
Ashmari Pramodya
Aulia Adila
Bryan Wilie
Candy Olivia Mawalim
Ching Lam Cheng
Daud Abolade
Emmanuele Chersoni
Enrico Santus … (see 31 more)
Fariz Ikhwantri
Garry Kuwanto
Hanyang Zhao
Haryo Akbarianto Wibowo
Holy Lovenia
Jan Christian Blaise Cruz
Jan Wira Gotama Putra
Junho Myung
Lucky Susanto
Maria Angelica Riera Machin
Marina Zhukova
Michael Anugraha
Muhammad Farid Adilazuarda
Natasha Santosa
Peerat Limkonchotiwat
Raj Dabre
Rio Alexander Audino
Samuel Cahyawijaya
Shi-Xiong Zhang
Stephanie Yulia Salim
Yi Zhou
Yinxuan Gui
En-Shiun Annie Lee
Shogo Okada
Ayu Purwarianti
Alham Fikri Aji
Taro Watanabe
Derry Tanti Wijaya
Alice Oh
Chong-Wah Ngo
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David LE MEUR
David Orlando Romero Mogrovejo
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
Aditya Nanda Kishore Khandavally
Aishik Mandal
Alina Dragonetti
Artem Abzaliev
Atnafu Lambebo Tonja
Bontu Fufa Balcha
Chenxi Whitehouse
Christian Salamea-Palacios
Dan John Velasco
D. Meur
Emilio Villa Cueva
Fajri Koto
Fauzan Farooqui … (see 57 more)
Frederico Belcavello
Ganzorig Batnasan
Gisela Vallejo
Gráinne Caulfield
Guido Ivetta
Haiyue Song
Henok Biadglign Ademtew
Hernán Maina
Holy Lovenia
Israel Abebe Azime
Jan Christian Blaise Cruz
Jay Gala
Jiahui Geng
Jesus-German Ortiz-Barajas
Jinheon Baek
Jocelyn Dunstan
Laura Alonso Alemany
Teresa Clifford
Kumaranage Ravindu Yasas Nagasinghe
Luciana Benotti
Luis Fernando D'Haro
Marcelo Viridiano
Marcos Estecha-Garitagoitia
Maria Camila Buitrago Cabrera
Mario Rodríguez-Cantelar
Mélanie Jouitteau
Mihail Minkov Mihaylov
Mohamed Fazli Mohamed Imam
Muhammad Farid Adilazuarda
Munkhjargal Gochoo
Munkh-Erdene Otgonbold
Naome Etori
Olivier NIYOMUGISHA
Paula Mónica Silva
Pranjal A Chitale
Raj Dabre
Rendi Chevi
Ruochen Zhang
Ryandito Diandaru
Samuel Cahyawijaya
Santiago Góngora
Soyeong Jeong
Sukannya Purkayastha
Tatsuki Kuribayashi
Thanmay Jayakumar
Tiago Timponi Torrent
Toqeer Ehsan
Vladimir Araujo
Yova Kementchedjhieva
Zara Burzo
Zheng Wei Lim
Zheng Xin Yong
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji