Publications

Multilingual Language Model Pretraining using Machine-translated Data

Jiayi Wang

Yao Lu

Maurice Weber

Max Ryabinin

David Ifeoluwa Adelani

Yihong Chen

Raphael Tang

Pontus Stenetorp

2025-02-18

ArXiv (preprint)

doi.org

arxiv.org

Random Forest Autoencoders for Guided Representation Learning

Adrien Aumon

Shuang Ni

Myriam Lizotte

Guy Wolf

Kevin R. Moon

Jake S. Rhodes

Decades of research have produced robust methods for unsupervised data visualization, yet supervised visualization…

2025-02-18

ArXiv (preprint)

arxiv.org

Random Forest Autoencoders for Guided Representation Learning

Adrien Aumon

Shuang Ni

Myriam Lizotte

Guy Wolf

Kevin R. Moon

Jake S. Rhodes

Decades of research have produced robust methods for unsupervised data visualization, yet supervised visualization…

2025-02-18

ArXiv (preprint)

doi.org

arxiv.org

Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives

Leo Schwinn

Yan Scholten

Tom Wollschlager

Sophie Xhonneux

Stephen Casper

Stephan Günnemann

Gauthier Gidel

Misaligned research objectives have considerably hindered progress in adversarial robustness research over the past decade. For instance, an… (see more) extensive focus on optimizing target metrics, while neglecting rigorous standardized evaluation, has led researchers to pursue ad-hoc heuristic defenses that were seemingly effective. Yet, most of these were exposed as flawed by subsequent evaluations, ultimately contributing little measurable progress to the field. In this position paper, we illustrate that current research on the robustness of large language models (LLMs) risks repeating past patterns with potentially worsened real-world implications. To address this, we argue that realigned objectives are necessary for meaningful progress in adversarial alignment. To this end, we build on established cybersecurity taxonomy to formally define differences between past and emerging threat models that apply to LLMs. Using this framework, we illustrate that progress requires disentangling adversarial alignment into addressable sub-problems and returning to core academic principles, such as measureability, reproducibility, and comparability. Although the field presents significant challenges, the fresh start on adversarial robustness offers the unique opportunity to build on past experience while avoiding previous mistakes.

2025-02-17

ArXiv (preprint)

arxiv.org

Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives

Leo Schwinn

Yan Scholten

Tom Wollschlager

Sophie Xhonneux

Stephen Casper

Stephan Günnemann

Gauthier Gidel

2025-02-17

ArXiv (preprint)

doi.org

arxiv.org

Automatic Pruning of Fine-tuning Datasets for Transformer-based Language Models

Sayed Mohammadreza Tayaranian Hosseini

Seyyed Hasan Mozafari

Brett H. Meyer

James J. Clark

Warren Gross

Transformer-based language models have shown state-of-the-art performance on a variety of natural language understanding tasks. To achieve t… (see more)his performance, these models are first pre-trained on general corpus and then fine-tuned on downstream tasks. Previous work studied the effect of pruning the training set of the downstream tasks on the performance of the model on its evaluation set. In this work, we propose an automatic dataset pruning method for the training set of fine-tuning tasks. Our method is based on the model’s success rate in correctly classifying each training data point. Unlike previous work which relies on user feedback to determine subset size, our method automatically extracts training subsets that are adapted for each pair of model and fine-tuning task. Our method provides multiple subsets for use in dataset pruning that navigate the trade-off between subset size and evaluation accuracy. Our largest subset, which we also refer to as the winning ticket subset, is on average

2025-02-17

Proceedings of The 3rd Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Shamsuddeen Hassan Muhammad

Nedjma OUSIDHOUM

Idris Abdulmumin

Jan Philip Wahle

Terry Lima Ruas

Meriem Beloucif

Christine de Kock

Nirmal Surange

Daniela Teodorescu

Ibrahim Ahmad

David Ifeoluwa Adelani

Alham Fikri Aji

Felermino Ali

Ilseyar Alimova

Vladimir Araujo

Nikolay Babakov

Naomi Baes

Ana-Maria Bucur

Andiswa Bukula

Guanqun Cao … (see 28 more)

Rodrigo Tufino Cardenas

Rendi Chevi

Chiamaka Ijeoma Chukwuneke

Alexandra Ciobotaru

Daryna Dementieva

Murja Sani Gadanya

Robert Geislinger

Bela Gipp

Oumaima Hourrane

Oana Ignat

Falalu Lawan

Rooweither Mabuya

Rahmad Mahendra

Vukosi Marivate

Andrew Piper

Alexander Panchenko

Charles Henrique Porto Ferreira

Vitaly Protasov

Samuel Rutunda

Manish Shrivastava

Aura Cristina Udrea

Lilian D. A. Wanzare

Sophie Wu

Florian Valentin Wunderlich

Hanif Muhammad Zhafran

Tianhui Zhang

Yi Zhou

Saif M. Mohammad

2025-02-17

ArXiv (preprint)

doi.org

arxiv.org

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Shamsuddeen Hassan Muhammad

Nedjma OUSIDHOUM

Idris Abdulmumin

Jan Philip Wahle

Terry Lima Ruas

Meriem Beloucif

Christine de Kock

Nirmal Surange

Daniela Teodorescu

Ibrahim Ahmad

David Ifeoluwa Adelani

Alham Fikri Aji

Felermino Ali

Ilseyar Alimova

Vladimir Araujo

Nikolay Babakov

Naomi Baes

Ana-Maria Bucur

Andiswa Bukula

Guanqun Cao … (see 28 more)

Rodrigo Tufino Cardenas

Rendi Chevi

Chiamaka Ijeoma Chukwuneke

Alexandra Ciobotaru

Daryna Dementieva

Murja Sani Gadanya

Robert Geislinger

Bela Gipp

Oumaima Hourrane

Oana Ignat

Falalu Lawan

Rooweither Mabuya

Rahmad Mahendra

Vukosi Marivate

Andrew Piper

Alexander Panchenko

Charles Henrique Porto Ferreira

Vitaly Protasov

Samuel Rutunda

Manish Shrivastava

Aura Cristina Udrea

Lilian D. A. Wanzare

Sophie Wu

Florian Valentin Wunderlich

Hanif Muhammad Zhafran

Tianhui Zhang

Yi Zhou

Saif M. Mohammad

2025-02-17

ArXiv (preprint)

doi.org

arxiv.org

Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

Pedro Vianna

Muawiz Chaudhary

Paria Mehrbod

An Tang

Guy Cloutier

Guy Wolf

Michael Eickenberg

Eugene Belilovsky

Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the … (see more)data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters.

2025-02-17

Proceedings of The 3rd Conference on Lifelong Learning Agents (published)

doi.org

arxiv.org

Characterizing co-purchased food products with soda, fresh fruits, and fresh vegetables using loyalty card purchasing data in Montréal, Canada, 2015–2017

Hiroshi Mamiya

Kody Crowell

Catherine L. Mah

Amélie Quesnel-Vallée

Aman Verma

David Buckeridge

2025-02-17

The International Journal of Behavioral Nutrition and Physical Activity (published)

doi.org

In-Context Parametric Inference: Point or Distribution Estimators?

Sarthak Mittal

Yoshua Bengio

Nikolay Malkin

Guillaume Lajoie

Bayesian and frequentist inference are two fundamental paradigms in statistical estimation. Bayesian methods treat hypotheses as random vari… (see more)ables, incorporating priors and updating beliefs via Bayes' theorem, whereas frequentist methods assume fixed but unknown hypotheses, relying on estimators like maximum likelihood. While extensive research has compared these approaches, the frequentist paradigm of obtaining point estimates has become predominant in deep learning, as Bayesian inference is challenging due to the computational complexity and the approximation gap of posterior estimation methods. However, a good understanding of trade-offs between the two approaches is lacking in the regime of amortized estimators, where in-context learners are trained to estimate either point values via maximum likelihood or maximum a posteriori estimation, or full posteriors using normalizing flows, score-based diffusion samplers, or diagonal Gaussian approximations, conditioned on observations. To help resolve this, we conduct a rigorous comparative analysis spanning diverse problem settings, from linear models to shallow neural networks, with a robust evaluation framework assessing both in-distribution and out-of-distribution generalization on tractable tasks. Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems, and we further discuss why this might be the case.

2025-02-17

ArXiv (preprint)

arxiv.org

In-Context Parametric Inference: Point or Distribution Estimators?

Sarthak Mittal

Yoshua Bengio

Nikolay Malkin

Guillaume Lajoie

Bayesian and frequentist inference are two fundamental paradigms in statistical estimation. Bayesian methods treat hypotheses as random vari… (see more)ables, incorporating priors and updating beliefs via Bayes' theorem, whereas frequentist methods assume fixed but unknown hypotheses, relying on estimators like maximum likelihood. While extensive research has compared these approaches, the frequentist paradigm of obtaining point estimates has become predominant in deep learning, as Bayesian inference is challenging due to the computational complexity and the approximation gap of posterior estimation methods. However, a good understanding of trade-offs between the two approaches is lacking in the regime of amortized estimators, where in-context learners are trained to estimate either point values via maximum likelihood or maximum a posteriori estimation, or full posteriors using normalizing flows, score-based diffusion samplers, or diagonal Gaussian approximations, conditioned on observations. To help resolve this, we conduct a rigorous comparative analysis spanning diverse problem settings, from linear models to shallow neural networks, with a robust evaluation framework assessing both in-distribution and out-of-distribution generalization on tractable tasks. Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems, and we further discuss why this might be the case.

2025-02-17

ArXiv (preprint)

doi.org

arxiv.org

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Publications

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Popular keywords:

Publications