Publications

Is Exploration or Optimization the Problem for Deep Reinforcement Learning?

In the era of deep reinforcement learning, making progress is more complex, as the collected experience must be compressed into a deep model… (voir plus) for future exploitation and sampling. Many papers have shown that training a deep learning policy under the changing state and action distribution leads to sub-optimal performance even collapse. This naturally leads to the concern that even if the community creates improved exploration algorithms or reward objectives, will those improvements fall on the \textit{deaf ears} of optimization difficulties. This work proposes a new \textit{pracitcal} sub-optimality estimator to determine optimization limitations of deep reinforcement learning algorithms. Through experiments acrossenvironments and RL algorithms, it is shown that the difference between the best data generated is

2025-07-01

rl-conference.cc/RLC/2025/Workshop/Finding_the_Frame (publié)

openreview.net

Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists

Owen Lewis

Neil Ghani

Andrew Joseph Dudzik

Christos Perivolaropoulos

Razvan Pascanu

Petar Veličković

2025-07-01

arXiv (publié)

doi.org

From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease

Peter William VanHarn Plantinga

Jen-Kai Chen

Roozbeh Sattari

Mirco Ravanelli

Denise Klein

Speech holds promise as a cost-effective and non-invasive biomarker for neurological conditions such as Parkinson's disease (PD). While deep… (voir plus) learning systems trained on raw audio can find subtle signals not available from hand-crafted features, their black-box nature hinders clinical adoption. To address this, we apply sparse autoencoders (SAEs) to uncover interpretable internal representations from a speech-based PD detection system. We introduce a novel mask-based activation for adapting SAEs to small biomedical datasets, creating sparse disentangled dictionary representations. These dictionary entries are found to have strong associations with characteristic articulatory deficits in PD speech, such as reduced spectral flux and increased spectral flatness in the low-energy regions highlighted by the model attention. We further show that the spectral flux is related to volumetric measurements of the putamen from MRI scans, demonstrating the potential of SAEs to reveal clinically relevant biomarkers for disease monitoring and diagnosis.

2025-07-01

arXiv (publié)

doi.org

arxiv.org

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Ali Saheb Pasand

Pablo Samuel Castro

Pouya Bashivan

We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (voir plus)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic baselines.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/Finding_the_Frame (publié)

openreview.net

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Ali Saheb Pasand

Pablo Samuel Castro

Pouya Bashivan

We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (voir plus)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic baselines.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/Finding_the_Frame (publié)

openreview.net

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Ali Saheb Pasand

Pablo Samuel Castro

Pouya Bashivan

We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (voir plus)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic reward baselines.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (publié)

openreview.net

GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Victor May

Justine Gehring

Antonio Orvieto

Muawiz Chaudhary

Eilif Benjamin Muller

Irina Rish

Samira Ebrahimi Kahou

Massimo Caccia

The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent v… (voir plus)ersion updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that state-of-the-art systems encounter significant challenges with this task; enterprise models achieving baseline success rates in the 48-51% range, underscoring the intricacy of the problem. By offering an execution-based benchmark emphasizing the dynamic nature of code libraries, GitChameleon 2.0 enables a clearer understanding of this challenge and helps guide the development of more adaptable and dependable AI code generation methods. We make the dataset and evaluation code publicly available at https://github.com/mrcabbage972/GitChameleonBenchmark.

2025-07-01

arXiv (publié)

doi.org

Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

Shahrad Mohammadzadeh

Juan David Guerra

Marco Bonizzato

Reihaneh Rabbany

Golnoosh Farnadi

2025-07-01

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (publié)

doi.org

Harnessing agent-based frameworks in CellAgentChat to unravel cell-cell interactions from single-cell and spatial transcriptomics

Vishvak Raghavan

Yumin Zheng

Yue Li

Jun Ding

2025-07-01

Genome Research (publié)

doi.org

Health data issues in Africa: time for digitization, standardization and harmonization

Abdoelnaser Degoot

Ismaël Koné

Shakuntala Baichoo

Mercy Ngungu

Nzisa Liku

Judit Kumuthini

Joyce Nakatumba-Nabende

Foutse Khomh

Bubacarr Bah

2025-07-01

Nature Communications (publié)

doi.org

How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models

Dharshan Kumaran

Stephen M Fleming

Larisa Markeeva

Joseph Heyward

Andrea Banino

Mrinal Mathur

Razvan Pascanu

Simon Kayode Osindero

Benedetto De Martino

Petar Veličković

Viorica Patraucean

Large language models (LLMs) exhibit strikingly conflicting behaviors: they can appear steadfastly overconfident in their initial answers wh… (voir plus)ilst at the same time being prone to excessive doubt when challenged. To investigate this apparent paradox, we developed a novel experimental paradigm, exploiting the unique ability to obtain confidence estimates from LLMs without creating memory of their initial judgments -- something impossible in human participants. We show that LLMs -- Gemma 3, GPT4o and o1-preview -- exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer, resulting in a marked resistance to change their mind. We further demonstrate that LLMs markedly overweight inconsistent compared to consistent advice, in a fashion that deviates qualitatively from normative Bayesian updating. Finally, we demonstrate that these two mechanisms -- a drive to maintain consistency with prior commitments and hypersensitivity to contradictory feedback -- parsimoniously capture LLM behavior in a different domain. Together, these findings furnish a mechanistic account of LLM confidence that explains both their stubbornness and excessive sensitivity to criticism.

2025-07-01

arXiv (publié)

doi.org

arxiv.org

HVAC-GRACE: Transferable Building Control via Heterogeneous Graph Neural Network Policies

Anaïs Berkes

Donna Vakalis

David Rolnick

Yoshua Bengio

Buildings consume 40% of global energy, with HVAC systems responsible for up to half of that demand. As energy use grows, optimizing HVAC ef… (voir plus)ficiency is critical to meeting climate goals. While reinforcement learning (RL) offers a promising alternative to rule-based control, real-world adoption is limited by poor sample efficiency and generalisation. We introduce HVAC-GRACE, a graph-based RL framework that models buildings as heterogeneous graphs and integrates spatial message passing directly into temporal GRU gates. This enables each zone to learn control actions informed by both its own history and its structural context. Our architecture supports zero-shot transfer by learning topology-agnostic functions—but initial experiments reveal that this benefit depends on sufficient conditioned zone connectivity to maintain gradient flow. These findings highlight both the promise and the architectural requirements of scalable, transferable RL for building control

2025-07-01

ICML.cc/2025/Workshop/CO-BUILD (poster)

openreview.net

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Hugo Larochelle nommé directeur scientifique de Mila

Publications

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Hugo Larochelle nommé directeur scientifique de Mila

Mots-clés populaires:

Publications