Publications

Revisiting Laplacian Representations for Value Function Approximation in Deep RL
Rishav
A. Chandar
Yash Chandak
S Ebrahimi Kahou
Proto-value functions (PVFs) introduced Laplacian embeddings as an effective feature basis for value-function approximation; however, their … (see more)utility remained limited to small, fully known state spaces. Recent work has scaled Laplacian embeddings to high-dimensional inputs, using them for reward shaping and option discovery in goal-directed tasks, yet only as auxiliary signals, rather than directly using them as features for value functions. In this paper, we learn Laplacian eigenvectors online and employ them as features for Q-learning in 23 Atari games. We empirically demonstrate that these online–learned embeddings substantially improve model-free RL in large, high-dimensional domains. We demonstrate that enriching state representations with action embeddings yields additional gains under both behavior-policy and uniform-random policies. Additionally, we introduce the Fusion architecture, which augments the representation with useful inductive bias at the embedding level. To assess the usefulness of each embedding used in the Fusion architecture, we use Shapley values analysis.
Training PPO-Clip with Parallelized Data Generation: A Case of Fixed-Point Convergence
In recent years, with the increase in the compute power of GPUs, parallelized data collection has become the dominant approach for training … (see more)reinforcement learning (RL) agents. Proximal Policy Optimization (PPO) is one of the widely-used on-policy methods for training RL agents. In this paper, we focus on the training behavior of PPO-Clip with the increase in the number of parallel environments. In particular, we show that as we increase the amount of data used to train PPO-Clip, the optimized policy would converge to a fixed distribution. We use the results to study the behavior of PPO-Clip in two case studies: the effect of change in the minibatch size and the effect of increase in the number of parallel environments versus the increase in the rollout lengths. The experiments show that settings with high-return PPO runs result in slower convergence to the fixed-distribution and higher consecutive KL divergence changes. Our results aim to offer a better understanding for the prediction of the performance of PPO with the scaling of the parallel environments.
Neuromorphic hierarchical modular reservoirs
Filip Milisav
Andrea I Luppi
Laura E Suarez
Bratislav Misic
Modularity is a fundamental principle of brain organization, reflected in the presence of segregated sub-networks that enable specialized in… (see more)formation processing. These small, densely connected modules are often nested within larger, higher-order modules, giving rise to a hierarchical modular architecture. This structure is posited to balance information segregation in specialized neuronal communities and global integration via intermodular communication. Yet, how hierarchical modularity shapes network function remains unclear. Here we introduce a simple blockmodeling framework for generating and comparing multi-level hierarchical modular networks and implement them as recurrent neural network reservoirs to evaluate their computational capacity. We show that hierarchical modular networks enhance memory capacity, support multitasking, and give rise to a broader range of temporal dynamics compared to strictly modular and random networks. These functional advantages can be traced to topological features enriched in hierarchical modular networks, which include reciprocal and cyclic network motifs. To test whether the computational advantages of hierarchical modularity subsist in empirical human brain structural connectivity patterns, we develop a novel hierarchical modularity-preserving network null model, allowing us to isolate the positive effect of empirical hierarchical modularity patterns on memory capacity. To evaluate the biomimetic validity of connectome-informed reservoir dynamics, we compare reservoir timescales to empirical brain timescales derived from MEG data and find that hierarchical modularity contributes to shaping brain-like neural timescales. Altogether, across multiple benchmarks, these results show that hierarchical modularity endows networks with computationally advantageous properties, providing insight into the relationship between neural network structure and function with potential applications for the design of neuromorphic computing architectures.
Behavioral Suite Analysis of Self-Supervised Learning in Atari
Rishav
D. Nowrouzezahrai
S Ebrahimi Kahou
A deep generative model for deciphering cellular dynamics and in silico drug discovery in complex diseases
Yumin Zheng
Jonas C. Schupp
Taylor Adams
Geremy Clair
Aurelien Justet
Farida Ahangari
Xiting Yan
Paul Hansen
Marianne Carlon
Emanuela Cortesi
Marie Vermant
Robin Vos
Laurens J. De Sadeleer
Iván O. Rosas
Ricardo Pineda
John Sembrat
Melanie Königshoff
John E. McDonough
Bart M. Vanaudenaerde
Wim A. Wuyts … (see 2 more)
Naftali Kaminski
Human diseases are characterized by intricate cellular dynamics. Single-cell transcriptomics provides critical insights, yet a persistent ga… (see more)p remains in computational tools for detailed disease progression analysis and targeted in silico drug interventions. Here we introduce UNAGI, a deep generative neural network tailored to analyse time-series single-cell transcriptomic data. This tool captures the complex cellular dynamics underlying disease progression, enhancing drug perturbation modelling and screening. When applied to a dataset from patients with idiopathic pulmonary fibrosis, UNAGI learns disease-informed cell embeddings that sharpen our understanding of disease progression, leading to the identification of potential therapeutic drug candidates. Validation using proteomics reveals the accuracy of UNAGI’s cellular dynamics analysis, and the use of the fibrotic cocktail-treated human precision-cut lung slices confirms UNAGI’s predictions that nifedipine, an antihypertensive drug, may have anti-fibrotic effects on human tissues. UNAGI’s versatility extends to other diseases, including COVID, demonstrating adaptability and confirming its broader applicability in decoding complex cellular dynamics beyond idiopathic pulmonary fibrosis, amplifying its use in the quest for therapeutic solutions across diverse pathological landscapes.
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
A major bottleneck in scientific discovery consists of narrowing an exponentially large set of objects, such as proteins or molecules, to a … (see more)small set of promising candidates with desirable properties. While this process can rely on expert knowledge, recent methods leverage reinforcement learning (RL) guided by a proxy reward function to enable this filtering. By employing various forms of entropy regularization, these methods aim to learn samplers that generate diverse candidates that are highly rated by the proxy function. In this work, we make two main contributions. First, we show that these methods are liable to generate overly diverse, suboptimal candidates in large search spaces. To address this issue, we introduce a novel unified operator that combines several regularized RL operators into a general framework that better targets peakier sampling distributions. Secondly, we offer a novel, robust RL perspective of this filtering process. The regularization can be interpreted as robustness to a compositional form of uncertainty in the proxy function (i.e., the true evaluation of a candidate differs from the proxy's evaluation). Our analysis leads us to a novel, easy-to-use algorithm we name trajectory general mellowmax (TGM): we show it identifies higher quality, diverse candidates than baselines in both synthetic and real-world tasks. Code: https://github.com/marcojira/tgm.
Human-AI Alignment of Learning Trajectories in Video Games: a continual RL benchmark proposal
Yann Harel
Lune P Bellec
We propose a design for a continual reinforcement learning (CRL) benchmark called GHAIA, centered on human-AI alignment of learning trajecto… (see more)ries in structured video game environments. Using \textit{Super Mario Bros.} as a case study, gameplay is decomposed into short, annotated scenes organized into diverse task sequences based on gameplay patterns and difficulty. Evaluation protocols measure both plasticity and stability, with flexible revisit and pacing schedules. A key innovation is the inclusion of high-resolution human gameplay data collected under controlled conditions, enabling direct comparison of human and agent learning. In addition to adapting classical CRL metrics like forgetting and backward transfer, we introduce semantic transfer metrics capturing learning over groups of scenes sharing similar game patterns. We demonstrate the feasibility of our approach on human and agent data, and discuss key aspects of the first release for community input.
SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics
Rahul Singh
Yanlei Zhang
J. Adam Noah
Joy Hirsch
Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of … (see more)graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.
Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity
In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL … (see more)benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce"Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.
An Empirical Study of Sensitive Information in Logs
Roozbeh Aghili
Heng Li
Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective
The rapid adaptation ability of auto-regressive foundation models is often attributed to the diversity of their pre-training data. This is b… (see more)ecause, from a Bayesian standpoint, minimizing prediction error in such settings requires integrating over all plausible latent hypotheses consistent with observations. While this behavior is desirable in principle, it often proves too ambitious in practice: under high ambiguity, the number of plausible latent alternatives makes Bayes-optimal prediction computationally intractable. Cognitive science has long recognized this limitation, suggesting that under such conditions, heuristics or information-seeking strategies are preferable to exhaustive inference. Translating this insight to next-token prediction, we hypothesize that low- and high-ambiguity predictions pose different computational demands, making ambiguity-agnostic next-token prediction a detrimental inductive bias. To test this, we introduce MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and a tractable Bayesian oracle. We show that Transformers indeed struggle with high-ambiguity predictions across model sizes. Motivated by cognitive theories, we propose a method to convert pre-trained models into Monte Carlo predictors that decouple task inference from token prediction. Preliminary results show substantial gains in ambiguous contexts through improved capacity allocation and test-time scalable inference, though challenges remain.
Visual symbolic mechanisms: Emergent symbol processing in vision language models
Declan Campbell
To accurately process a visual scene, observers must bind features together to represent individual objects. This capacity is necessary, for… (see more) instance, to distinguish an image containing a red square and a blue circle from an image containing a blue square and a red circle. Recent work has found that language models solve this'binding problem'via a set of symbol-like, content-independent indices, but it is unclear whether similar mechanisms are employed by vision language models (VLMs). This question is especially relevant, given the persistent failures of VLMs on tasks that require binding. Here, we identify a set of emergent symbolic mechanisms that support binding in VLMs via a content-independent, spatial indexing scheme. Moreover, we find that binding errors can be traced directly to failures in these mechanisms. Taken together, these results shed light on the mechanisms that support symbol-like processing in VLMs, and suggest possible avenues for addressing the persistent binding failures exhibited by these models.