Publications

Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks

Gavin McCracken

Gabriela Moisescu-Pareja

Vincent Létourneau

Jonathan Love

We propose a testable universality hypothesis, asserting that seemingly disparate neural network solutions observed in the simple task of mo… (see more)dular addition are unified under a common abstract algorithm. While prior work interpreted variations in neuron-level representations as evidence for distinct algorithms, we demonstrate - through multi-level analyses spanning neurons, neuron clusters, and entire networks - that multilayer perceptrons and transformers universally implement the abstract algorithm we call the approximate Chinese Remainder Theorem. Crucially, we introduce approximate cosets and show that neurons activate exclusively on them. Furthermore, our theory works for deep neural networks (DNNs). It predicts that universally learned solutions in DNNs with trainable embeddings or more than one hidden layer require only O(log n) features, a result we empirically confirm. This work thus provides the first theory-backed interpretation of multilayer networks solving modular addition. It advances generalizable interpretability and opens a testable universality hypothesis for group multiplication beyond modular addition.

2025-05-01

arXiv (published)

doi.org

arxiv.org

VinePPO: Refining Credit Assignment in RL Training of LLMs

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (see more)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a common reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, recent approaches achieve strong results without it, raising questions about the efficacy of value networks in practice. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they often produce poor estimate of expected return and barely outperform a random baseline when comparing alternative steps. This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time (up to 3.0x). Crucially, it achieves higher test accuracy for a given training accuracy, capturing more generalization signal per sample. These results emphasize the importance of accurate credit assignment in RL training of LLM.

2025-05-01

ICML.cc/2025/Conference (poster)

openreview.net

Virtual Cells: Predict, Explain, Discover

Emmanuel Noutahi

Jason Hartford

Prudencio Tossou

Shawn Whitfield

Ali Denton

Cas Wognum

Kristina Ulicna

Michael Craig

Jonathan Hsu

Michael Cuccarese

Emmanuel Bengio

Dominique Beaini

Christopher Gibson

Daniel Cohen

Berton Earnshaw

2025-05-01

arXiv (published)

doi.org

arxiv.org

When to retrain a machine learning model

Florence Regol

Leo Schwinn

Kyle Sprague

Mark Coates

Thomas Markovich

A significant challenge in maintaining real-world machine learning models is responding to the continuous and unpredictable evolution of dat… (see more)a. Most practitioners are faced with the difficult question: when should I retrain or update my machine learning model? This seemingly straightforward problem is particularly challenging for three reasons: 1) decisions must be made based on very limited information - we usually have access to only a few examples, 2) the nature, extent, and impact of the distribution shift are unknown, and 3) it involves specifying a cost ratio between retraining and poor performance, which can be hard to characterize. Existing works address certain aspects of this problem, but none offer a comprehensive solution. Distribution shift detection falls short as it cannot account for the cost trade-off; the scarcity of the data, paired with its unusual structure, makes it a poor fit for existing offline reinforcement learning methods, and the online learning formulation overlooks key practical considerations. To address this, we present a principled formulation of the retraining problem and propose an uncertainty-based method that makes decisions by continually forecasting the evolution of model performance evaluated with a bounded metric. Our experiments, addressing classification tasks, show that the method consistently outperforms existing baselines on 7 datasets. We thoroughly assess its robustness to varying cost trade-off values and mis-specified cost trade-offs.

2025-05-01

ICML.cc/2025/Conference (poster)

openreview.net

Caffeine induces age-dependent increases in brain complexity and criticality during sleep

Philipp Thölke

Maxine Arcand-Lavigne

Tarek Lajnef

Sonia Frenette

Julie Carrier

Karim Jerbi

2025-04-30

Communications Biology (published)

doi.org

ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties

Kevin Bijan Givechian

João Felipe Rocha

Edward Yang

Chen Liu

Kerrie Greene

Rex Ying

Etienne Caron

Akiko Iwasaki

Smita Krishnaswamy

2025-04-30

bioRxiv (preprint)

doi.org

Rootlets-based registration to the spinal cord PAM50 template

Sandrine B'edard

Jan Valovsek

Valeria Oliva

Kenneth A. Weber

Julien Cohen-Adad

2025-04-30

ArXiv (preprint)

arxiv.org

JPerfEvo: A Tool for Tracking Method-Level Performance Changes in Java Projects

Kaveh Shahedi

Maxime Lamothe

Foutse Khomh

Heng Li

Performance regressions and improvements are common phenomena in software development, occurring periodically as software evolves and mature… (see more)s. When developers introduce new changes to a program’s codebase, unforeseen performance variations may arise. Identifying these changes at the method level, however, can be challenging due to the complexity and scale of modern codebases. In this work, we present JPerfEvo, a tool designed to automate the evaluation of the method-level performance impact of each code commit (i.e., the performance variations between the two versions before and after a commit). Leveraging the Java Microbenchmark Harness (JMH) module for benchmarking the modified methods, JPerfEvo instruments their execution and applies robust statistical evaluations to detect performance changes. The tool can classify these changes as performance improvements, regressions, or neutral (i.e., no change), with the change magnitude. We evaluated JPerfEvo on three popular and mature open-source Java projects, demonstrating its effectiveness in identifying performance changes throughout their development histories.

2025-04-28

IEEE Working Conference on Mining Software Repositories (published)

doi.org

Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Mouad Abrini

Omri Abend

Dina M. Acklin

Henny Admoni

Gregor Aichinger

Nitay Alon

Zahra Ashktorab

Ashish Atreja

Moises Auron

Alexander Aufreiter

Raghav Awasthi

Soumya Banerjee

Joseph Barnby

Rhea Basappa

Severin Bergsmann

Djallel Bouneffouf

Patrick Callaghan

Marc Cavazza

Thierry Chaminade

Sonia Chernova … (see 88 more)

Mohamed Chetouan

Moumita Choudhury

Axel Cleeremans

J. Cywinski

Fabio Cuzzolin

Hokin Deng

N'yoma Diamond

C. D. Pasquasio

Guillaume Dumas

Max J. van Duijn

Mahapatra Dwarikanath

Qingying Gao

Ashok Goel

Rebecca R. Goldstein

Matthew C. Gombolay

Gabriel Enrique Gonzalez

Amar Halilovic

Tobias Halmdienst

Mahimul Islam

Julian Jara-Ettinger

Natalie Kastel

Renana Keydar

Ashish K. Khanna

Mahdi Khoramshahi

Jihyun Kim

Mihyeon Kim

Youngbin Kim

Senka Krivic

Nikita Krasnytskyi

Arun Kumar

Junehyoung Kwon

EunJu Lee

Shane Lee

Peter R. Lewis 0001

Xue Li

Yijiang Li

Michal Lewandowski

Nathan Lloyd

Matthew B. Luebbers

Dezhi Luo

Haiyun Lyu

Dwarikanath Mahapatra

Kamal Maheshwari

Mallika Mainali

P. Mathur

Patrick Mederitsch

Shuwa Miura

Manuel Preston de Miranda

Reuth Mirsky

Shreya Mishra

Nina M. Moorman

Katelyn Morrison

John Muchovej

Bernhard Nessler

Felix Nessler

Hieu Minh Jord Nguyen

Abby Ortego

F. Papay

Antoine Pasquali

Hamed Rahimi

C. Raghu

Amanda L. Royka

Stefan Sarkadi

Jaelle Scheuerman

Simon Schmid

Paul Schrater

Anik Sen

Zahra Sheikhbahaee

Ke Shi

Reid G. Simmons

Nishant Singh

Mason O. Smith

Ramira van der Meulen

Anthia Solaki

Haoran Sun

Viktor Szolga

Matthew E. Taylor

Travis Taylor

Sanne van Waveren

Juan David Vargas

R. Verbrugge

Eitan Wagner

Justin D. Weisz

Ximing Wen

William Yeoh

Wenlong Zhang

Michelle Zhao

Shlomo Zilberstein

2025-04-28

ArXiv (preprint)

arxiv.org

Solving Combinatorial Pricing Problems using Embedded Dynamic Programming Models

Quang Minh Bui

Margarida Carvalho

José Neto

The combinatorial pricing problem (CPP) is a bilevel problem in which the leader maximizes their revenue by imposing tolls on certain items … (see more)that they can control. Based on the tolls set by the leader, the follower selects a subset of items corresponding to an optimal solution of a combinatorial optimization problem. To accomplish the leader's goal, the tolls need to be sufficiently low to discourage the follower from choosing the items offered by the competitors. In this paper, we derive a single-level reformulation for the CPP by rewriting the follower's problem as a longest path problem using a dynamic programming model, and then taking its dual and applying strong duality. We proceed to solve the reformulation in a dynamic fashion with a cutting plane method. We apply this methodology to 2 distinct dynamic programming models, namely, a novel formulation designated as selection diagram and the well-known decision diagram. We also produce numerical results to evaluate their performances across 3 different specializations of the CPP and a closely related problem that is the knapsack interdiction problem. Our results showcase the potential of the 2 proposed reformulations over the natural value function approach, expanding the set of tools to solve combinatorial bilevel programs.

2025-04-28

INFORMS Journal on Computing (published)

doi.org

arxiv.org

How Programmers Interact with Multimodal Software Documentation

Deeksha M. Arya

Jin Guo

Martin P. Robillard

There is a wide variety of online documentation to learn about a given software technology, and prior research has reported that programmers… (see more) must invest time and effort to identify one that best suits their need. We evaluated five modalities to present information that enable a software document to cater to the different presentation needs of programmers. We developed a prototype tutorial with these modalities on three topics in Java, namely, regular expressions, inheritance, and exception handling. We investigated how people interact with the modalities in the tutorial given a programming topic and a type of task. We conducted a survey study with 56 respondents and confirm that although text content is most useful for solving conceptual tasks, code examples support deeper comprehension of the underlying concepts. Furthermore, we report that respondents' contradicting preferences for the modalities suggest the need to have multiple alternatives in a software tutorial.

2025-04-27

2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE) (published)

doi.org

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

Prateek Yadav

Colin Raffel

Mohammed Muqeeth

Lucas Caccia

Haokun Liu

Tianlong Chen

Mohit Bansal

Leshem Choshen

Alessandro Sordoni

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particula… (see more)r domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with improved performance or generalization. A key component of MoErging methods is the creation of a router that decides which expert model(s) to use for a particular input or application. The promise, effectiveness, and large design space of MoErging has spurred the development of many new methods over the past few years. This rapid pace of development has made it challenging to compare different MoErging methods, which are rarely compared to one another and are often validated in different experimental setups. To remedy such gaps, we present a comprehensive survey of MoErging methods that includes a novel taxonomy for cataloging key design choices and clarifying suitable applications for each method. Apart from surveying MoErging research, we inventory software tools and applications that make use of MoErging. We additionally discuss related fields of study such as model merging, multitask learning, and mixture-of-experts models. Taken as a whole, our survey provides a unified overview of existing MoErging methods and creates a solid foundation for future work in this burgeoning field.

2025-04-25

TMLR (accepted)

doi.org

openreview.net

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications