Publications

Opening the Scope of Openness in AI

Tamara Paris

AJung Moon

Jin Guo

2025-06-23

Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (published)

doi.org

arxiv.org

A Survey of State Representation Learning for Deep Reinforcement Learning

Ayoub Echchahed

Pablo Samuel Castro

Representation learning methods are an important tool for addressing the challenges posed by complex observations spaces in sequential decis… (see more)ion making problems. Recently, many methods have used a wide variety of types of approaches for learning meaningful state representations in reinforcement learning, allowing better sample efficiency, generalization, and performance. This survey aims to provide a broad categorization of these methods within a model-free online setting, exploring how they tackle the learning of state representations differently. We categorize the methods into six main classes, detailing their mechanisms, benefits, and limitations. Through this taxonomy, our aim is to enhance the understanding of this field and provide a guide for new researchers. We also discuss techniques for assessing the quality of representations, and detail relevant future directions.

2025-06-23

TMLR (accepted)

openreview.net

The challenge of hidden gifts in multi-agent reinforcement learning

Dane Malenfant

Blake Richards

Cooperation between people is not always obvious. Sometimes we benefit from actions that others have taken even when we are unaware that the… (see more)y took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit to your own actions correctly when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show how credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness can benefit these settings

2025-06-23

rl-conference.cc/RLC/2025/Workshop/CoCoMARL (poster)

openreview.net

Towards Sustainable Investment Policies Informed by Opponent Shaping

Juan Agustin Duque

razvan ciuca

Ayoub Echchahed

Hugo Larochelle

Aaron Courville

Addressing climate change requires global coordination, yet rational economic actors often prioritize immediate gains over collective welfar… (see more)e, resulting in social dilemmas. InvestESG is a recently proposed multi-agent simulation that captures the dynamic interplay between investors and companies under climate risk. We provide a formal characterization of the conditions under which InvestESG exhibits an intertemporal social dilemma, deriving theoretical thresholds at which individual incentives diverge from collective welfare. Building on this, we apply Advantage Alignment, a scalable opponent shaping algorithm shown to be effective in general-sum games, to influence agent learning in InvestESG. We offer theoretical insights into why Advantage Alignment systematically favors socially beneficial equilibria by biasing learning dynamics toward cooperative outcomes. Our results demonstrate that strategically shaping the learning processes of economic agents can result in better outcomes that could inform policy mechanisms to better align market incentives with long-term sustainability goals.

2025-06-23

rl-conference.cc/RLC/2025/Workshop/CoCoMARL (poster)

openreview.net

Rethinking Full Finetuning from Pretraining Checkpoints in Active Learning for African Languages

Bonaventure F. P. Dossou

Ines Arous

Jackie Cheung

2025-06-22

aclweb.org/ACL/2025/SRW (poster)

openreview.net

Alveolar epithelial cell plasticity and injury memory in human pulmonary fibrosis

Taylor S Adams

Jonas C Schupp

Agshin Balayev

Johad Khoury

Aurelien Justet

Fadi Nikola

De Sadeleer J Laurens

Juan Cala Garcia

Marta Zapata-Ortega

Benos V Panayiotis

P.V. Benos

John E McDonough

Farida Ahangari

Melanie Koenigshoff

Jun Ding

Robert J Homer

Ivan O Rosas

Xiting Yan

Bart M Vanaudenaerde

Wim A Wuyts … (see 1 more)

Naftali Kaminski

2025-06-21

bioRxiv (preprint)

doi.org

Neuromorphic hierarchical modular reservoirs

Filip Milisav

Andrea I Luppi

Laura E Suárez

Guillaume Lajoie

Bratislav Mišić

2025-06-21

bioRxiv (preprint)

doi.org

A deep generative model for deciphering cellular dynamics and in silico drug discovery in complex diseases.

Yumin Zheng

Jonas C Schupp

Taylor S Adams

Geremy Clair

Aurelien Justet

Farida Ahangari

Xiting Yan

Paul Hansen

Marianne Carlon

Emanuela Cortesi

Marie Vermant

Robin Vos

De Sadeleer J Laurens

Ivan O Rosas

Ricardo Pineda

John Sembrat

Melanie Königshoff

John E McDonough

Bart M. Vanaudenaerde

Wim A Wuyts … (see 2 more)

Naftali Kaminski

Jun Ding

2025-06-20

Nature Biomedical Engineering (published)

doi.org

Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators

Marco Jiralerspong

Esther Derman

Danilo Vucetic

Nikolay Malkin

Bilun Sun

Tianyu Zhang

Pierre-Luc Bacon

Gauthier Gidel

A major bottleneck in scientific discovery involves narrowing a large combinatorial set of objects, such as proteins or molecules, to a smal… (see more)l set of promising candidates. While this process largely relies on expert knowledge, recent methods leverage reinforcement learning (RL) to enhance this filtering. They achieve this by estimating proxy reward functions from available datasets and using regularization to generate more diverse candidates. These reward functions are inherently uncertain, raising a particularly salient challenge for scientific discovery. In this work, we show that existing methods, often framed as sampling proportional to a reward function, are inadequate and yield suboptimal candidates, especially in large search spaces. To remedy this issue, we take a robust RL approach and introduce a unified operator that seeks robustness to the uncertainty of the proxy reward function. This general operator targets peakier sampling distributions while encompassing known soft RL operators. It also leads us to a novel algorithm that identifies higher-quality, diverse candidates in both synthetic and real-world tasks. Ultimately, our work offers a new, flexible perspective on discrete compositional generation tasks. Code: https://github.com/marcojira/tgm.

2025-06-20

ArXiv (preprint)

arxiv.org

Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity

Samin Yeasar Arnob

Scott Fujimoto

Doina Precup

In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL … (see more)benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce"Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.

2025-06-20

ArXiv (preprint)

arxiv.org

Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective

Leo Gagnon

Eric Elmoznino

Sarthak Mittal

Tom Marty

Tejas Kasetty

Dhanya Sridhar

Guillaume Lajoie

The rapid adaptation ability of auto-regressive foundation models is often attributed to the diversity of their pre-training data. This is b… (see more)ecause, from a Bayesian standpoint, minimizing prediction error in such settings requires integrating over all plausible latent hypotheses consistent with observations. While this behavior is desirable in principle, it often proves too ambitious in practice: under high ambiguity, the number of plausible latent alternatives makes Bayes-optimal prediction computationally intractable. Cognitive science has long recognized this limitation, suggesting that under such conditions, heuristics or information-seeking strategies are preferable to exhaustive inference. Translating this insight to next-token prediction, we hypothesize that low- and high-ambiguity predictions pose different computational demands, making ambiguity-agnostic next-token prediction a detrimental inductive bias. To test this, we introduce MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and a tractable Bayesian oracle. We show that Transformers indeed struggle with high-ambiguity predictions across model sizes. Motivated by cognitive theories, we propose a method to convert pre-trained models into Monte Carlo predictors that decouple task inference from token prediction. Preliminary results show substantial gains in ambiguous contexts through improved capacity allocation and test-time scalable inference, though challenges remain.

2025-06-19

ArXiv (preprint)

arxiv.org

Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective

Leo Gagnon

Eric Elmoznino

Sarthak Mittal

Tom Marty

Tejas Kasetty

Dhanya Sridhar

Guillaume Lajoie

The rapid adaptation ability of auto-regressive foundation models is often attributed to the diversity of their pre-training data. This is b… (see more)ecause, from a Bayesian standpoint, minimizing prediction error in such settings requires integrating over all plausible latent hypotheses consistent with observations. While this behavior is desirable in principle, it often proves too ambitious in practice: under high ambiguity, the number of plausible latent alternatives makes Bayes-optimal prediction computationally intractable. Cognitive science has long recognized this limitation, suggesting that under such conditions, heuristics or information-seeking strategies are preferable to exhaustive inference. Translating this insight to next-token prediction, we hypothesize that low- and high-ambiguity predictions pose different computational demands, making ambiguity-agnostic next-token prediction a detrimental inductive bias. To test this, we introduce MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and a tractable Bayesian oracle. We show that Transformers indeed struggle with high-ambiguity predictions across model sizes. Motivated by cognitive theories, we propose a method to convert pre-trained models into Monte Carlo predictors that decouple task inference from token prediction. Preliminary results show substantial gains in ambiguous contexts through improved capacity allocation and test-time scalable inference, though challenges remain.

2025-06-19

ArXiv (preprint)

doi.org

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications