Publications

Opening the Scope of Openness in AI

Tamara Paris

AJung Moon

Jin Guo

2025-06-23

Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (published)

doi.org

arxiv.org

A Survey of State Representation Learning for Deep Reinforcement Learning

Ayoub Echchahed

Pablo Samuel Castro

Representation learning methods are an important tool for addressing the challenges posed by complex observations spaces in sequential decis… (see more)ion making problems. Recently, many methods have used a wide variety of types of approaches for learning meaningful state representations in reinforcement learning, allowing better sample efficiency, generalization, and performance. This survey aims to provide a broad categorization of these methods within a model-free online setting, exploring how they tackle the learning of state representations differently. We categorize the methods into six main classes, detailing their mechanisms, benefits, and limitations. Through this taxonomy, our aim is to enhance the understanding of this field and provide a guide for new researchers. We also discuss techniques for assessing the quality of representations, and detail relevant future directions.

2025-06-23

TMLR (accepted)

openreview.net

The challenge of hidden gifts in multi-agent reinforcement learning

Dane Malenfant

Blake Richards

Cooperation between people is not always obvious. Sometimes we benefit from actions that others have taken even when we are unaware that the… (see more)y took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit to your own actions correctly when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show how credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness can benefit these settings

2025-06-23

rl-conference.cc/RLC/2025/Workshop/CoCoMARL (poster)

openreview.net

Towards Sustainable Investment Policies Informed by Opponent Shaping

Juan Agustin Duque

razvan ciuca

Ayoub Echchahed

Hugo Larochelle

Aaron Courville

Addressing climate change requires global coordination, yet rational economic actors often prioritize immediate gains over collective welfar… (see more)e, resulting in social dilemmas. InvestESG is a recently proposed multi-agent simulation that captures the dynamic interplay between investors and companies under climate risk. We provide a formal characterization of the conditions under which InvestESG exhibits an intertemporal social dilemma, deriving theoretical thresholds at which individual incentives diverge from collective welfare. Building on this, we apply Advantage Alignment, a scalable opponent shaping algorithm shown to be effective in general-sum games, to influence agent learning in InvestESG. We offer theoretical insights into why Advantage Alignment systematically favors socially beneficial equilibria by biasing learning dynamics toward cooperative outcomes. Our results demonstrate that strategically shaping the learning processes of economic agents can result in better outcomes that could inform policy mechanisms to better align market incentives with long-term sustainability goals.

2025-06-23

rl-conference.cc/RLC/2025/Workshop/CoCoMARL (poster)

openreview.net

Rethinking Full Finetuning from Pretraining Checkpoints in Active Learning for African Languages

Bonaventure F. P. Dossou

Ines Arous

Jackie Cheung

2025-06-22

aclweb.org/ACL/2025/SRW (poster)

openreview.net

Alveolar epithelial cell plasticity and injury memory in human pulmonary fibrosis

Taylor S Adams

Jonas C Schupp

Agshin Balayev

Johad Khoury

Aurelien Justet

Fadi Nikola

De Sadeleer J Laurens

Juan Cala Garcia

Marta Zapata-Ortega

Benos V Panayiotis

P.V. Benos

John E McDonough

Farida Ahangari

Melanie Koenigshoff

Jun Ding

Robert J Homer

Ivan O Rosas

Xiting Yan

Bart M Vanaudenaerde

Wim A Wuyts … (see 1 more)

Naftali Kaminski

2025-06-21

bioRxiv (preprint)

doi.org

Neuromorphic hierarchical modular reservoirs

Filip Milisav

Andrea I Luppi

Laura E Suárez

Guillaume Lajoie

Bratislav Mišić

2025-06-21

bioRxiv (preprint)

doi.org

Behavioral Suite Analysis of Self-Supervised Learning in Atari

Somjit Nath

Rishav

Gopeshh Subbaraj

Derek Nowrouzezahrai

Samira Ebrahimi Kahou

2025-06-20

rl-conference.cc/RLC/2025/Workshop/RLVG (published)

openreview.net

A deep generative model for deciphering cellular dynamics and in silico drug discovery in complex diseases.

Yumin Zheng

Jonas C Schupp

Taylor S Adams

Geremy Clair

Aurelien Justet

Farida Ahangari

Xiting Yan

Paul Hansen

Marianne Carlon

Emanuela Cortesi

Marie Vermant

Robin Vos

De Sadeleer J Laurens

Ivan O Rosas

Ricardo Pineda

John Sembrat

Melanie Königshoff

John E McDonough

Bart M. Vanaudenaerde

Wim A Wuyts … (see 2 more)

Naftali Kaminski

Jun Ding

2025-06-20

Nature Biomedical Engineering (published)

doi.org

Human-AI Alignment of Learning Trajectories in Video Games: a continual RL benchmark proposal

Yann Harel

Lune Bellec

François Paugam

Hugo Delhaye

Audrey Durand

We propose a design for a continual reinforcement learning (CRL) benchmark called GHAIA, centered on human-AI alignment of learning trajecto… (see more)ries in structured video game environments. Using \textit{Super Mario Bros.} as a case study, gameplay is decomposed into short, annotated scenes organized into diverse task sequences based on gameplay patterns and difficulty. Evaluation protocols measure both plasticity and stability, with flexible revisit and pacing schedules. A key innovation is the inclusion of high-resolution human gameplay data collected under controlled conditions, enabling direct comparison of human and agent learning. In addition to adapting classical CRL metrics like forgetting and backward transfer, we introduce semantic transfer metrics capturing learning over groups of scenes sharing similar game patterns. We demonstrate the feasibility of our approach on human and agent data, and discuss key aspects of the first release for community input.

2025-06-20

rl-conference.cc/RLC/2025/Workshop/RLVG (published)

openreview.net

Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators

Marco Jiralerspong

Esther Derman

Danilo Vucetic

Nikolay Malkin

Bilun Sun

Tianyu Zhang

Pierre-Luc Bacon

Gauthier Gidel

A major bottleneck in scientific discovery involves narrowing a large combinatorial set of objects, such as proteins or molecules, to a smal… (see more)l set of promising candidates. While this process largely relies on expert knowledge, recent methods leverage reinforcement learning (RL) to enhance this filtering. They achieve this by estimating proxy reward functions from available datasets and using regularization to generate more diverse candidates. These reward functions are inherently uncertain, raising a particularly salient challenge for scientific discovery. In this work, we show that existing methods, often framed as sampling proportional to a reward function, are inadequate and yield suboptimal candidates, especially in large search spaces. To remedy this issue, we take a robust RL approach and introduce a unified operator that seeks robustness to the uncertainty of the proxy reward function. This general operator targets peakier sampling distributions while encompassing known soft RL operators. It also leads us to a novel algorithm that identifies higher-quality, diverse candidates in both synthetic and real-world tasks. Ultimately, our work offers a new, flexible perspective on discrete compositional generation tasks. Code: https://github.com/marcojira/tgm.

2025-06-20

ArXiv (preprint)

arxiv.org

Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity

Samin Yeasar Arnob

Scott Fujimoto

Doina Precup

In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL … (see more)benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce"Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.

2025-06-20

ArXiv (preprint)

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications