Publications

Training PPO-Clip with Parallelized Data Generation: A Case of Fixed-Point Convergence

Roger Creus Castanyer

In recent years, with the increase in the compute power of GPUs, parallelized data collection has become the dominant approach for training … (voir plus)reinforcement learning (RL) agents. Proximal Policy Optimization (PPO) is one of the widely-used on-policy methods for training RL agents. In this paper, we focus on the training behavior of PPO-Clip with the increase in the number of parallel environments. In particular, we show that as we increase the amount of data used to train PPO-Clip, the optimized policy would converge to a fixed distribution. We use the results to study the behavior of PPO-Clip in two case studies: the effect of change in the minibatch size and the effect of increase in the number of parallel environments versus the increase in the rollout lengths. The experiments show that settings with high-return PPO runs result in slower convergence to the fixed-distribution and higher consecutive KL divergence changes. Our results aim to offer a better understanding for the prediction of the performance of PPO with the scaling of the parallel environments.

2025-06-21

rl-conference.cc/RLC/2025/Workshop/IBRL (publié)

openreview.net

Neuromorphic hierarchical modular reservoirs

Filip Milisav

Andrea I Luppi

Laura E Suarez

Guillaume Lajoie

Bratislav Misic

Modularity is a fundamental principle of brain organization, reflected in the presence of segregated sub-networks that enable specialized in… (voir plus)formation processing. These small, densely connected modules are often nested within larger, higher-order modules, giving rise to a hierarchical modular architecture. This structure is posited to balance information segregation in specialized neuronal communities and global integration via intermodular communication. Yet, how hierarchical modularity shapes network function remains unclear. Here we introduce a simple blockmodeling framework for generating and comparing multi-level hierarchical modular networks and implement them as recurrent neural network reservoirs to evaluate their computational capacity. We show that hierarchical modular networks enhance memory capacity, support multitasking, and give rise to a broader range of temporal dynamics compared to strictly modular and random networks. These functional advantages can be traced to topological features enriched in hierarchical modular networks, which include reciprocal and cyclic network motifs. To test whether the computational advantages of hierarchical modularity subsist in empirical human brain structural connectivity patterns, we develop a novel hierarchical modularity-preserving network null model, allowing us to isolate the positive effect of empirical hierarchical modularity patterns on memory capacity. To evaluate the biomimetic validity of connectome-informed reservoir dynamics, we compare reservoir timescales to empirical brain timescales derived from MEG data and find that hierarchical modularity contributes to shaping brain-like neural timescales. Altogether, across multiple benchmarks, these results show that hierarchical modularity endows networks with computationally advantageous properties, providing insight into the relationship between neural network structure and function with potential applications for the design of neuromorphic computing architectures.

2025-06-20

bioRxiv (prépublication)

doi.org

Behavioral Suite Analysis of Self-Supervised Learning in Atari

Somjit Nath

Rishav

Gopeshh Subbaraj

D. Nowrouzezahrai

S Ebrahimi Kahou

2025-06-19

rl-conference.cc/RLC/2025/Workshop/RLVG (accepté)

openreview.net

A deep generative model for deciphering cellular dynamics and in silico drug discovery in complex diseases

Yumin Zheng

Jonas C. Schupp

Taylor Adams

Geremy Clair

Aurelien Justet

Farida Ahangari

Xiting Yan

Paul Hansen

Marianne Carlon

Emanuela Cortesi

Marie Vermant

Robin Vos

Laurens J. De Sadeleer

Ivan O. Rosas

Ricardo Pineda

John Sembrat

Melanie Königshoff

John E. McDonough

Bart M. Vanaudenaerde

Wim A. Wuyts … (voir 2 de plus)

Naftali Kaminski

Jun Ding

Human diseases are characterized by intricate cellular dynamics. Single-cell transcriptomics provides critical insights, yet a persistent ga… (voir plus)p remains in computational tools for detailed disease progression analysis and targeted in silico drug interventions. Here we introduce UNAGI, a deep generative neural network tailored to analyse time-series single-cell transcriptomic data. This tool captures the complex cellular dynamics underlying disease progression, enhancing drug perturbation modelling and screening. When applied to a dataset from patients with idiopathic pulmonary fibrosis, UNAGI learns disease-informed cell embeddings that sharpen our understanding of disease progression, leading to the identification of potential therapeutic drug candidates. Validation using proteomics reveals the accuracy of UNAGI’s cellular dynamics analysis, and the use of the fibrotic cocktail-treated human precision-cut lung slices confirms UNAGI’s predictions that nifedipine, an antihypertensive drug, may have anti-fibrotic effects on human tissues. UNAGI’s versatility extends to other diseases, including COVID, demonstrating adaptability and confirming its broader applicability in decoding complex cellular dynamics beyond idiopathic pulmonary fibrosis, amplifying its use in the quest for therapeutic solutions across diverse pathological landscapes.

2025-06-19

Nature Biomedical Engineering (publié)

doi.org

Human-AI Alignment of Learning Trajectories in Video Games: a continual RL benchmark proposal

Yann Harel

Lune P Bellec

François Paugam

Hugo Delhaye

Audrey Durand

We propose a design for a continual reinforcement learning (CRL) benchmark called GHAIA, centered on human-AI alignment of learning trajecto… (voir plus)ries in structured video game environments. Using \textit{Super Mario Bros.} as a case study, gameplay is decomposed into short, annotated scenes organized into diverse task sequences based on gameplay patterns and difficulty. Evaluation protocols measure both plasticity and stability, with flexible revisit and pacing schedules. A key innovation is the inclusion of high-resolution human gameplay data collected under controlled conditions, enabling direct comparison of human and agent learning. In addition to adapting classical CRL metrics like forgetting and backward transfer, we introduce semantic transfer metrics capturing learning over groups of scenes sharing similar game patterns. We demonstrate the feasibility of our approach on human and agent data, and discuss key aspects of the first release for community input.

2025-06-19

rl-conference.cc/RLC/2025/Workshop/RLVG (accepté)

openreview.net

An Empirical Study of Sensitive Information in Logs

Roozbeh Aghili

Heng Li

Foutse Khomh

2025-06-18

Proceedings of the ACM on Software Engineering (publié)

doi.org

arxiv.org

Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective

The rapid adaptation ability of auto-regressive foundation models is often attributed to the diversity of their pre-training data. This is b… (voir plus)ecause, from a Bayesian standpoint, minimizing prediction error in such settings requires integrating over all plausible latent hypotheses consistent with observations. While this behavior is desirable in principle, it often proves too ambitious in practice: under high ambiguity, the number of plausible latent alternatives makes Bayes-optimal prediction computationally intractable. Cognitive science has long recognized this limitation, suggesting that under such conditions, heuristics or information-seeking strategies are preferable to exhaustive inference. Translating this insight to next-token prediction, we hypothesize that low- and high-ambiguity predictions pose different computational demands, making ambiguity-agnostic next-token prediction a detrimental inductive bias. To test this, we introduce MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and a tractable Bayesian oracle. We show that Transformers indeed struggle with high-ambiguity predictions across model sizes. Motivated by cognitive theories, we propose a method to convert pre-trained models into Monte Carlo predictors that decouple task inference from token prediction. Preliminary results show substantial gains in ambiguous contexts through improved capacity allocation and test-time scalable inference, though challenges remain.

2025-06-18

ArXiv (prépublication)

doi.org

arxiv.org

SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics

Siddharth Viswanath

Rahul Singh

Yanlei Zhang

J. Adam Noah

Joy Hirsch

Smita Krishnaswamy

Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of … (voir plus)graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.

2025-06-18

ArXiv (prépublication)

doi.org

arxiv.org

Learning From the Past with Cascading Eligibility Traces

Tokiniaina Raharison Ralambomihanta

Ivan Anokhin

Roman Pogodin

Samira Ebrahimi Kahou

Jonathan Cornford

Blake A. Richards

2025-06-16

arXiv (prépublication)

doi.org

openreview.net

Less is More: Undertraining Experts Improves Model Upcycling

Stefan Horoi

Eugene Belilovsky

Guy Wolf

Gintare Karolina Dziugaite

Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized dataset… (voir plus)s. This has led to a proliferation of expert models and adapters, often shared via platforms like HuggingFace and AdapterHub. To leverage these resources, numerous model upcycling methods have emerged, enabling the reuse of fine-tuned models in multi-task systems. A natural pipeline has thus formed to harness the benefits of transfer learning and amortize sunk training costs: models are pre-trained on general data, fine-tuned on specific tasks, and then upcycled into more general-purpose systems. A prevailing assumption is that improvements at one stage of this pipeline propagate downstream, leading to gains at subsequent steps. In this work, we challenge that assumption by examining how expert fine-tuning affects model upcycling. We show that long fine-tuning of experts that optimizes for their individual performance leads to degraded merging performance, both for fully fine-tuned and LoRA-adapted models, and to worse downstream results when LoRA adapters are upcycled into MoE layers. We trace this degradation to the memorization of a small set of difficult examples that dominate late fine-tuning steps and are subsequently forgotten during merging. Finally, we demonstrate that a task-dependent aggressive early stopping strategy can significantly improve upcycling performance.

2025-06-16

arXiv (Cornell University) (prépublication)

doi.org

arxiv.org

Scalable Tree Search over Graphs with Learned Action Pruning for Power Grid Control

Florence Cloutier

Cyrus Neary

Adriana Hugessen

Viktor Todosijevic

Zina Kamel

Glen Berseth

As real-world infrastructure systems become increasingly complex and large-scale, there is a growing need for learning-based control strateg… (voir plus)ies that can make informed decisions in complex and dynamic environments. However, large-scale problems — such as power grid control — introduce high-dimensional action spaces and necessitate transferability across varying grid topologies. We introduce **H**ierarchical **E**xpert-Guided **R**econfiguration **O**ptimization for **G**raph **T**opologies, **HERO-GT**, a model-based planning approach that combines a pretrained graph neural network (GNN) for topology-aware action pruning with a Monte Carlo Tree Search (MCTS) planner for targeted, structured exploration. More specifically, the high-level GNN predicts a promising subset of actions, which the low-level MCTS agent uses to focus its search and reduce computational overhead while remaining adaptable to unseen graph structures. Furthermore, the MCTS planner leverages a given *default policy*---which may be defined, for example, by heuristics, problem relaxations, or rule-based methods---to bias the search and prioritize actions that are expected to improve performance over the default. We deploy HERO-GT in power grid environments, demonstrating that it not only improves over a strong default policy, but also scales to a realistic operational setting where exhaustive search becomes computationally infeasible.

2025-06-16

rl-conference.cc/RLC/2025/Workshop/RL4RS (publié)

openreview.net

Discovering Temporal Structure: An Overview of Hierarchical Reinforcement Learning

Martin Klissarov

Akhil Bagaria

Ziyan Luo

George Konidaris

Doina Precup

Marlos C. Machado

Developing agents capable of exploring, planning and learning in complex open-ended environments is a grand challenge in artificial intellig… (voir plus)ence (AI). Hierarchical reinforcement learning (HRL) offers a promising solution to this challenge by discovering and exploiting the temporal structure within a stream of experience. The strong appeal of the HRL framework has led to a rich and diverse body of literature attempting to discover a useful structure. However, it is still not clear how one might define what constitutes good structure in the first place, or the kind of problems in which identifying it may be helpful. This work aims to identify the benefits of HRL from the perspective of the fundamental challenges in decision-making, as well as highlight its impact on the performance trade-offs of AI agents. Through these benefits, we then cover the families of methods that discover temporal structure in HRL, ranging from learning directly from online experience to offline datasets, to leveraging large language models (LLMs). Finally, we highlight the challenges of temporal structure discovery and the domains that are particularly well-suited for such endeavours.

2025-06-15

ArXiv (prépublication)

doi.org

arxiv.org

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Publications