Publications

A Robot Walks into a Bar: Can Language Models Serve as Creativity SupportTools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Piotr Mirowski
Juliette Love
Shakir Mohamed
Temporal trends in disparities in COVID-19 seropositivity among Canadian blood donors
Yuan Yu
Matthew J Knight
Diana Gibson
Sheila F O’Brien
W Alton Russell
Abstract Background In Canada’s largest COVID-19 serological study, SARS-CoV-2 antibodies in blood donors have been monitored since 2020. … (see more)No study has analysed changes in the association between anti-N seropositivity (a marker of recent infection) and geographic and sociodemographic characteristics over the pandemic. Methods Using Bayesian multi-level models with spatial effects at the census division level, we analysed changes in correlates of SARS-CoV-2 anti-N seropositivity across three periods in which different variants predominated (pre-Delta, Delta and Omicron). We analysed disparities by geographic area, individual traits (age, sex, race) and neighbourhood factors (urbanicity, material deprivation and social deprivation). Data were from 420 319 blood donations across four regions (Ontario, British Columbia [BC], the Prairies and the Atlantic region) from December 2020 to November 2022. Results Seropositivity was higher for racialized minorities, males and individuals in more materially deprived neighbourhoods in the pre-Delta and Delta waves. These subgroup differences dissipated in the Omicron wave as large swaths of the population became infected. Across all waves, seropositivity was higher in younger individuals and those with lower neighbourhood social deprivation. Rural residents had high seropositivity in the Prairies, but not other regions. Compared to generalized linear models, multi-level models with spatial effects had better fit and lower error when predicting SARS-CoV-2 anti-N seropositivity by geographic region. Conclusions Correlates of recent COVID-19 infection have evolved over the pandemic. Many disparities lessened during the Omicron wave, but public health intervention may be warranted to address persistently higher burden among young people and those with less social deprivation.
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
Melissa Hall
Samuel J. Bell
Candace Ross
Adina Williams
Michal Drozdzal
Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of … (see more)thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated metrics to facilitate scalable and cost-effective performance profiling. However, commonly-used metrics often fail to account for the full diversity of human preference; often even in-depth human evaluations face challenges with subjectivity, especially as interpretations of evaluation criteria vary across regions and cultures. In this work, we conduct a large, cross-cultural study to study how much annotators in Africa, Europe, and Southeast Asia vary in their perception of geographic representation, visual appeal, and consistency in real and generated images from state-of-the art public APIs. We collect over 65,000 image annotations and 20 survey responses. We contrast human annotations with common automated metrics, finding that human preferences vary notably across geographic location and that current metrics do not fully account for this diversity. For example, annotators in different locations often disagree on whether exaggerated, stereotypical depictions of a region are considered geographically representative. In addition, the utility of automatic evaluations is dependent on assumptions about their set-up, such as the alignment of feature extractors with human perception of object similarity or the definition of"appeal"captured in reference datasets used to ground evaluations. We recommend steps for improved automatic and human evaluations.
Visibility into AI Agents
Alan Chan
Carson Ezell
Max Kaufmann
Kevin Wei
Lewis Hammond
Herbie Bradley
Emma Bluemke
Nitarshan Rajkumar
Noam Kolt
Lennart Heim
Markus Anderljung
Efficient Leverage Score Sampling for Tensor Train Decomposition
Vivek Bharadwaj
Beheshteh T. Rakhshan
Osman Asif Malik
Milnor-Myerson Games and The Principles of Artificial Principal-Agent Problems
Manfred Diaz
Joel Z Leibo
In this paper, we introduce Milnor-Myerson games, a multiplayer interaction structure at the core of machine learning (ML), to shed light on… (see more) the fundamental principles and implications the artificial principal-agent problem has had in landmark ML results like AlphaGo and large language models (LLMs).
PETRA: Parallel End-to-end Training with Reversible Architectures
Stephane Rivaud
Louis Fournier
Thomas Pumir
Michael Eickenberg
Edouard Oyallon
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep l… (see more)earning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
Zhaohan Daniel Guo
Bernardo Avila Pires
Yunhao Tang
Clare Lyle
Mark Rowland
Nicolas Heess
Diana Borsa
Arthur Guez
Will Dabney
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli
Louis Fournier
Pierre Erbacher
Louis Serrano
Edouard Oyallon
From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
G'eraldin Nanfack
Michael Eickenberg
Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applicat… (see more)ions. Mechanistic inter- pretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are usually interpreted by visualizing their node features through a popular technique called feature visualization. Recent works have analyzed the stability of different feature visualization types under the adversarial model manipulation framework. This paper starts by addressing limitations in existing works by proposing a novel attack called ProxPulse that simultaneously manipulates the two types of feature visualizations. Surprisingly, when analyzing these attacks under the umbrella of visual circuits, we find that visual circuits show some robustness to ProxPulse. We, therefore, introduce a new attack based on ProxPulse that unveils the manipulability of visual circuits, shedding light on their lack of robustness. The effectiveness of these attacks is validated using pre-trained AlexNet and ResNet-50 models on ImageNet.
MOSEAC: Streamlined Variable Time Step Reinforcement Learning
Dong Wang
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
Jiayi Wang
Sweta Agrawal
Marek Masiak
Ricardo Rei
Eleftheria Briakou
Marine Carpuat
Xuanli He
Sofia Bourhim
Andiswa Bukula
Muhidin A. Mohamed
Temitayo Olatoye
Tosin Adewumi
Hamam Mokayed
Christine Mwase
Wangui Kimotho
Foutse Yuehgoh
Aremu Anuoluwapo
Jessica Ojo
Shamsuddeen Hassan Muhammad … (see 41 more)
Salomey Osei
Abdul-Hakeem Omotayo
Chiamaka Ijeoma Chukwuneke
Perez Ogayo
Oumaima Hourrane
Salma El Anigri
Lolwethu Ndolela
Thabiso Mangwana
Shafie Abdi Mohamed
Hassan Ayinde
Ayinde Hassan
Oluwabusayo Olufunke Awoyomi
Lama Alkhaled
sana Sabah al-azzawi
Naome Etori
Millicent Ochieng
Clemencia Siro
Samuel Njoroge
Njoroge Kiragu
Eric Muchiri
Wangari Kimotho
Lyse Naomi Wamba
Daud Abolade
Simbiat Ajao
Iyanuoluwa Shode
Ricky Macharm
Ruqayya Nasir Iro
Saheed Salahudeen Abdullahi
Stephen Moore
Bernard Opoku
Zainab Akinjobi
Abeeb Afolabi
Nnaemeka Casmir Obiefuna
Onyekachi Ogbu
Sam Brian
Sam Ochieng’
Verrah Akinyi Otiende
CHINEDU EMMANUEL MBONU
Toadoum Sari Sakayo
Yao Lu
Pontus Stenetorp
Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measur… (see more)ing this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).