Publications

Adaptation, Comparison and Practical Implementation of Fairness Schemes in Kidney Exchange Programs

William St-Arnaud

In Kidney Exchange Programs (KEPs), each participating patient is registered together with an incompatible donor. Donors without an incompat… (see more)ible patient can also register. Then, KEPs typically maximize overall patient benefit through donor exchanges. This aggregation of benefits calls into question potential individual patient disparities in terms of access to transplantation in KEPs. Considering solely this utilitarian objective may become an issue in the case where multiple exchange plans are optimal or near-optimal. In fact, current KEP policies are all-or-nothing, meaning that only one exchange plan is determined. Each patient is either selected or not as part of that unique solution. In this work, we seek instead to find a policy that contemplates the probability of patients of being in a solution. To guide the determination of our policy, we adapt popular fairness schemes to KEPs to balance the usual approach of maximizing the utilitarian objective. Different combinations of fairness and utilitarian objectives are modelled as conic programs with an exponential number of variables. We propose a column generation approach to solve them effectively in practice. Finally, we make an extensive comparison of the different schemes in terms of the balance of utility and fairness score, and validate the scalability of our methodology for benchmark instances from the literature.

2025-08-01

European Journal of Operational Research (published)

doi.org

arxiv.org

Celo: Training Versatile Learned Optimizers on a Compute Diet

Abhinav Moudgil

Boris Knyazev

Guillaume Lajoie

Eugene Belilovsky

Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned upda… (see more)te rules that enable faster, hyperparameter-free training of neural networks. A critical element for practically useful learned optimizers, that can be used off-the-shelf after meta-training, is strong meta-generalization: the ability to apply the optimizers to new tasks. Recent state-of-the-art work in learned optimizers, VeLO (Metz et al., 2022), requires a large number of highly diverse meta-training tasks along with massive computational resources, 4000 TPU months, to achieve meta-generalization. This makes further improvements to such learned optimizers impractical. In this work, we identify several key elements in learned optimizer architectures and meta-training procedures that can lead to strong meta-generalization. We also propose evaluation metrics to reliably assess quantitative performance of an optimizer at scale on a set of evaluation tasks. Our proposed approach, Celo, makes a significant leap in improving the meta-generalization performance of learned optimizers and also outperforms tuned state-of-the-art optimizers on a diverse set of out-of-distribution tasks, despite being meta-trained for just 24 GPU hours.

2025-07-08

TMLR (accepted)

openreview.net

Assemblies, synapse clustering, and network topology interact with plasticity to explain structure-function relationships of the cortical connectome

András Ecker

Daniela Egas Santander

Marwan Abdellah

Jorge Blanco Alonso

Sirio Bolaños-Puchet

Giuseppe Chindemi

Dhuruva Priyan Gowri Mariyappan

James B. Isbister

James King

Pramod Kumbhar

Ioannis Magkanaris

Eilif Benjamin Muller

Michael W. Reimann

Synaptic plasticity underlies the brain’s ability to learn and adapt. While experiments in brain slices have revealed mechanisms and proto… (see more)cols for the induction of plasticity between pairs of neurons, how these synaptic changes are coordinated in biological neuronal networks to ensure the emergence of learning remains poorly understood. Simulation and modeling have emerged as important tools to study learning in plastic networks, but have yet to achieve a scale that incorporates realistic network structure, active dendrites, and multi-synapse interactions, key determinants of synaptic plasticity. To rise to this challenge, we endowed an existing large-scale cortical network model, incorporating data-constrained dendritic processing and multi-synaptic connections, with a calcium-based model of functional plasticity that captures the diversity of excitatory connections extrapolated to in vivo-like conditions. This allowed us to study how dendrites and network structure interact with plasticity to shape stimulus representations at the microcircuit level. In our exploratory simulations, plasticity acted sparsely and specifically, firing rates and weight distributions remained stable without additional homeostatic mechanisms. At the circuit level, we found plasticity was driven by co-firing stimulus-evoked functional assemblies, spatial clustering of synapses on dendrites, and the topology of the network connectivity. As a result of the plastic changes, the network became more reliable with more stimulus-specific responses. We confirmed our testable predictions in the MICrONS datasets, an openly available electron microscopic reconstruction of a large volume of cortical tissue. Our results quantify at a large scale how the dendritic architecture and higher-order structure of cortical microcircuits play a central role in functional plasticity and provide a foundation for elucidating their role in learning.

2025-07-03

eLife (published)

doi.org

Curiosity-Driven Exploration via Temporal Contrastive Learning

Faisal Mohamed

Catherine Ji

Benjamin Eysenbach

Glen Berseth

Effective exploration in reinforcement learning requires keeping track not just of where the agent has been, but also of how the agent think… (see more)s about and represents the world: an agent should explore states that enable it to learn powerful representations. Temporal representations can include the information required to solve any potential task while avoiding the computational cost of reconstruction. In this paper, we propose an exploration method that uses temporal contrastive representations to drive exploration, maximizing coverage as seen through the lens of these temporal representations. We demonstrate complex exploration behaviors in locomotion, manipulation, and embodied-AI tasks, revealing previously unknown capabilities and behaviors once achievable only via extrinsic rewards.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

openreview.net

Curiosity-Driven Exploration via Temporal \\ Contrastive Learning

Faisal Mohamed

Catherine Ji

Benjamin Eysenbach

Glen Berseth

Exploration remains a key challenge in reinforcement learning (RL), especially in long-horizon tasks and environments with high-dimensional … (see more)observations. A common strategy for effective exploration is to promote state coverage or novelty, which often involves estimating the agent's state visitation distribution. In this paper, we propose \textbf{C}uriosity-Driven Exploration via \textbf{Te}mporal \textbf{C}ontrastive Learning (\methodName), an exploration method based on temporal contrastive learning that rewards agents for reaching states with unexpected futures. This incentivizes uncovering meaningful less-visited states. \methodName is simple and does not require explicit density or uncertainty estimation, while learning representations aligned with the RL objective. It consistently outperforms standard baselines in complex mazes using different embodiments (Ant and Humanoid) and robotic manipulation tasks, while also yielding more diverse behaviors in Craftax without requiring task-specific information.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

openreview.net

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Ali Saheb Pasand

Pablo Samuel Castro

Pouya Bashivan

We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (see more)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic reward baselines.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

openreview.net

Harnessing agent-based frameworks in CellAgentChat to unravel cell-cell interactions from single-cell and spatial transcriptomics

Vishvak Raghavan

Yumin Zheng

Yue Li

Jun Ding

2025-07-01

Genome Research (published)

doi.org

Health data issues in Africa: time for digitization, standardization and harmonization

Abdoelnaser Degoot

Ismaël Koné

Shakuntala Baichoo

Mercy Ngungu

Nzisa Liku

Judit Kumuthini

Joyce Nakatumba-Nabende

Foutse Khomh

Bubacarr Bah

2025-07-01

Nature Communications (published)

doi.org

HVAC-GRACE: Transferable Building Control via Heterogeneous Graph Neural Network Policies

Anaïs Berkes

Donna Vakalis

David Rolnick

Yoshua Bengio

Buildings consume 40% of global energy, with HVAC systems responsible for up to half of that demand. As energy use grows, optimizing HVAC ef… (see more)ficiency is critical to meeting climate goals. While reinforcement learning (RL) offers a promising alternative to rule-based control, real-world adoption is limited by poor sample efficiency and generalisation. We introduce HVAC-GRACE, a graph-based RL framework that models buildings as heterogeneous graphs and integrates spatial message passing directly into temporal GRU gates. This enables each zone to learn control actions informed by both its own history and its structural context. Our architecture supports zero-shot transfer by learning topology-agnostic functions—but initial experiments reveal that this benefit depends on sufficient conditioned zone connectivity to maintain gradient flow. These findings highlight both the promise and the architectural requirements of scalable, transferable RL for building control

2025-07-01

ICML.cc/2025/Workshop/CO-BUILD (poster)

openreview.net