Portrait de David Ifeoluwa Adelani

David Ifeoluwa Adelani

Membre académique principal
Chaire en IA Canada-CIFAR
McGill University
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Traitement de la parole
Traitement du langage naturel

Biographie

David Adelani est professeur adjoint en science informatique et lutte contre les inégalités à l’Université McGill, et membre académique principal à Mila – Institut québécois d'intelligence artificielle. Ses recherches se concentrent sur le traitement multilingue du langage naturel, avec un accent particulier sur les langues sous-dotées en ressources.

Étudiants actuels

Maîtrise recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - McGill
Collaborateur·rice de recherche - McGill
Postdoctorat - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Doctorat - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - McGill
Maîtrise professionnelle - UdeM
Stagiaire de recherche - McGill
Maîtrise recherche - McGill

Publications

Multilingual Language Model Pretraining using Machine-translated Data
Jiayi Wang
Maurice Weber
Max Ryabinin
Yihong Chen
Raphael Tang
Pontus Stenetorp
Reassessing Speech Translation for Low-Resource Languages: Do LLMs Redefine the State-of-the-Art Against Cascaded Models?
Training of LLM-Based List-Wise Multilingual Reranker
Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation
Senyu Li
Jiayi Wang
Xue Liu
Pontus Stenetorp
Traditional supervised fine-tuning (SFT) strategies for sequence-to-sequence tasks often train models to directly generate the target output… (voir plus). Recent work has shown that guiding models with intermediate steps, such as keywords, outlines, or reasoning chains, can significantly improve performance, coherence, and interpretability. However, these methods often depend on predefined intermediate formats and annotated data, limiting their scalability and generalizability. In this work, we introduce a task-agnostic framework that enables models to generate intermediate "warmup" sequences. These warmup sequences, serving as an initial state for subsequent generation, are optimized to enhance the probability of generating the target sequence without relying on external supervision or human-designed structures. Drawing inspiration from reinforcement learning principles, our method iteratively refines these intermediate steps to maximize their contribution to the final output, similar to reward-driven optimization in reinforcement learning with human feedback. Experimental results across tasks such as translation, summarization, and multi-choice question answering for logical reasoning show that our approach outperforms traditional SFT methods, and offers a scalable and flexible solution for sequence-to-sequence tasks.
The Responsible Foundation Model Development Cheatsheet: A Review of Tools&Resources
Shayne Longpre
Stella Biderman
Alon Albalak
Hailey Schoelkopf
Daniel McDuff
Sayash Kapoor
Kevin Klyman
Kyle Lo
Gabriel Ilharco
Nay San
Maribeth Rauh
Aviya Skowron
Bertie Vidgen
Laura Weidinger
Arvind Narayanan
Victor Sanh
Percy Liang
Rishi Bommasani
Yacine Jernite
Luca Soldaini
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Shivalika Singh
Angelika Romanou
Cl'ementine Fourrier
Jian Gang Ngui
Daniel Vila-Suero
Peerat Limkonchotiwat
Kelly Marchisio
Wei Qi Leong
Yosephine Susanto
Raymond Ng
Shayne Longpre
Wei-Yin Ko
Madeline Smith
Antoine Bosselut
Alice Oh
André F. T. Martins
Leshem Choshen
Daphne Ippolito
Enzo Ferrante … (voir 3 de plus)
Marzieh Fadaee
Beyza Ermis
Sara Hooker
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
Elizabeth Proehl
Kai Chen
Imaan Khadir
Naome Etori
Shamsuddeen Hassan Muhammad
C. Mpanza
Igneciah Pocia Thete
Dietrich Klakow
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual accuracy often focus on high-resource languages primari… (voir plus)ly because datasets for low-resource languages (LRLs) are scarce. In this paper, we present Uhura -- a new benchmark that focuses on two tasks in six typologically-diverse African languages, created via human translation of existing English benchmarks. The first dataset, Uhura-ARC-Easy, is composed of multiple-choice science questions. The second, Uhura-TruthfulQA, is a safety benchmark testing the truthfulness of models on topics including health, law, finance, and politics. We highlight the challenges creating benchmarks with highly technical content for LRLs and outline mitigation strategies. Our evaluation reveals a significant performance gap between proprietary models such as GPT-4o and o1-preview, and Claude models, and open-source models like Meta's LLaMA and Google's Gemma. Additionally, all models perform better in English than in African languages. These results indicate that LMs struggle with answering scientific questions and are more prone to generating false claims in low-resource African languages. Our findings underscore the necessity for continuous improvement of multilingual LM capabilities in LRL settings to ensure safe and reliable use in real-world contexts. We open-source the Uhura Benchmark and Uhura Platform to foster further research and development in NLP for LRLs.
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata
Frederikus Hudi
Patrick Amadeus Irawan
David Anugraha
Rifki Afina Putri
Yutong Wang
Adam Nohejl
Ubaidillah Ariq Prathama
Nedjma OUSIDHOUM
Afifa Amriani
Anar Rzayev
Anirban Das
Ashmari Pramodya
Aulia Adila
Bryan Wilie
Candy Olivia Mawalim
Ching Lam Cheng
Daud Abolade
Emmanuele Chersoni
Enrico Santus … (voir 31 de plus)
Fariz Ikhwantri
Garry Kuwanto
Hanyang Zhao
Haryo Akbarianto Wibowo
Holy Lovenia
Jan Christian Blaise Cruz
Jan Wira Gotama Putra
Junho Myung
Lucky Susanto
Maria Angelica Riera Machin
Marina Zhukova
Michael Anugraha
Muhammad Farid Adilazuarda
Natasha Santosa
Peerat Limkonchotiwat
Raj Dabre
Rio Alexander Audino
Samuel Cahyawijaya
Shi-Xiong Zhang
Stephanie Yulia Salim
Yi Zhou
Yinxuan Gui
En-Shiun Annie Lee
Shogo Okada
Ayu Purwarianti
Alham Fikri Aji
Taro Watanabe
Derry Tanti Wijaya
Alice Oh
Chong-Wah Ngo
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David LE MEUR
David Orlando Romero Mogrovejo
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
Aditya Nanda Kishore Khandavally
Aishik Mandal
Alina Dragonetti
Artem Abzaliev
Atnafu Lambebo Tonja
Bontu Fufa Balcha
Chenxi Whitehouse
Christian Salamea-Palacios
Dan John Velasco
D. Meur
Emilio Villa Cueva
Fajri Koto
Fauzan Farooqui … (voir 57 de plus)
Frederico Belcavello
Ganzorig Batnasan
Gisela Vallejo
Gráinne Caulfield
Guido Ivetta
Haiyue Song
Henok Biadglign Ademtew
Hernán Maina
Holy Lovenia
Israel Abebe Azime
Jan Christian Blaise Cruz
Jiahui Geng
Jesus-German Ortiz-Barajas
Jinheon Baek
Jocelyn Dunstan
Laura Alonso Alemany
Teresa Clifford
Kumaranage Ravindu Yasas Nagasinghe
Luciana Benotti
Luis Fernando D'Haro
Marcelo Viridiano
Marcos Estecha-Garitagoitia
Maria Camila Buitrago Cabrera
Mario Rodríguez-Cantelar
Mélanie Jouitteau
Mihail Minkov Mihaylov
Mohamed Fazli Mohamed Imam
Muhammad Farid Adilazuarda
Munkhjargal Gochoo
Munkh-Erdene Otgonbold
Naome Etori
Olivier NIYOMUGISHA
Paula Mónica Silva
Pranjal A Chitale
Raj Dabre
Rendi Chevi
Ruochen Zhang
Ryandito Diandaru
Samuel Cahyawijaya
Santiago Góngora
Soyeong Jeong
Sukannya Purkayastha
Tatsuki Kuribayashi
Thanmay Jayakumar
Tiago Timponi Torrent
Toqeer Ehsan
Vladimir Araujo
Yova Kementchedjhieva
Zara Burzo
Zheng Wei Lim
Zheng Xin Yong
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
MINERS: Multilingual Language Models as Semantic Retrievers
Genta Indra Winata
Ruochen Zhang
Words have been represented in a high-dimensional vector space that encodes their semantic similarities, enabling downstream applications su… (voir plus)ch as retrieving synonyms, antonyms, and relevant contexts. However, despite recent advances in multilingual language models (LMs), the effectiveness of these models' representations in semantic retrieval contexts has not been comprehensively explored. To fill this gap, this paper introduces the MINERS, a benchmark designed to evaluate the ability of multilingual LMs in semantic retrieval tasks, including bitext mining and classification via retrieval-augmented contexts. We create a comprehensive framework to assess the robustness of LMs in retrieving samples across over 200 diverse languages, including extremely low-resource languages in challenging cross-lingual and code-switching settings. Our results demonstrate that by solely retrieving semantically similar embeddings yields performance competitive with state-of-the-art approaches, without requiring any fine-tuning.
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
Israel Abebe Azime
Zhuang Yun Jian
Jesujoba Oluwadara Alabi
Xuanli He
Millicent Ochieng
Sara Hooker
Andiswa Bukula
En-Shiun Annie Lee
Chiamaka Ijeoma Chukwuneke
Happy Buzaaba
Blessing Kudzaishe Sibanda
Godson Kalipe
Jonathan Mukiibi
Salomon Kabongo
Foutse Yuehgoh
M. Setaka
Lolwethu Ndolela
Nkiruka Bridget Odu … (voir 6 de plus)
Rooweither Mabuya
Shamsuddeen Hassan Muhammad
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languag… (voir plus)es. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 16 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based QA~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages~(such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Aya-101 only at 58\% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like LLaMa 3 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
Jiayi Wang
Sweta Agrawal
Marek Masiak
Ricardo Rei
Eleftheria Briakou
Marine Carpuat
Xuanli He
Sofia Bourhim
Andiswa Bukula
Muhidin A. Mohamed
Temitayo Olatoye
Tosin Adewumi
Hamam Mokayed
Christine Mwase
Wangui Kimotho
Foutse Yuehgoh
Aremu Anuoluwapo
Shamsuddeen Hassan Muhammad … (voir 41 de plus)
Salomey Osei
Abdul-Hakeem Omotayo
Chiamaka Ijeoma Chukwuneke
Perez Ogayo
Oumaima Hourrane
Salma El Anigri
Lolwethu Ndolela
Thabiso Mangwana
Shafie Abdi Mohamed
Hassan Ayinde
Ayinde Hassan
Oluwabusayo Olufunke Awoyomi
Lama Alkhaled
sana Sabah al-azzawi
Naome Etori
Millicent Ochieng
Clemencia Siro
Samuel Njoroge
Njoroge Kiragu
Eric Muchiri
Wangari Kimotho
Lyse Naomi Wamba
Daud Abolade
Simbiat Ajao
Iyanuoluwa Shode
Ricky Macharm
Ruqayya Nasir Iro
Saheed Salahudeen Abdullahi
Stephen Moore
Bernard Opoku
Zainab Akinjobi
Abeeb Afolabi
Nnaemeka Casmir Obiefuna
Onyekachi Ogbu
Sam Brian
Sam Ochieng’
Verrah Akinyi Otiende
CHINEDU EMMANUEL MBONU
Toadoum Sari Sakayo
Pontus Stenetorp
Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measur… (voir plus)ing this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).