Alex Hernandez-Garcia

Jinkyoo Park

Designing biological sequences with desired properties is a significant challenge due to the combinatorially vast search space and the high … (voir plus)cost of evaluating each candidate sequence. To address these challenges, reinforcement learning (RL) methods, such as GFlowNets, utilize proxy models for rapid reward evaluation and annotated data for policy training. Although these approaches have shown promise in generating diverse and novel sequences, the limited training data relative to the vast search space often leads to the misspecification of proxy for out-of-distribution inputs. We introduce

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (publié)

Catalyst GFlowNet for electrocatalyst design: A hydrogen evolution reaction case study

Lena Podina

Christina Humer

Alexandre AGM Duval

Victor Schmidt

Ali Ramlaoui

Shahana Chatterjee

David Rolnick

Félix Therrien

Efficient and inexpensive energy storage is essential for accelerating the adoption of renewable energy and ensuring a stable supply, despit… (voir plus)e fluctuations in sources such as wind and solar. Electrocatalysts play a key role in hydrogen energy storage (HES), allowing the energy to be stored as hydrogen. However, the development of affordable and high-performance catalysts for this process remains a significant challenge. We introduce Catalyst GFlowNet, a generative model that leverages machine learning-based predictors of formation and adsorption energy to design crystal surfaces that act as efficient catalysts. We demonstrate the performance of the model through a proof-of-concept application to the hydrogen evolution reaction, a key reaction in HES, for which we successfully identified platinum as the most efficient known catalyst. In future work, we aim to extend this approach to the oxygen evolution reaction, where current optimal catalysts are expensive metal oxides, and open the search space to discover new materials. This generative modeling framework offers a promising pathway for accelerating the search for novel and efficient catalysts.

2025-10-02

ArXiv (prépublication)

Catalyst GFlowNet for electrocatalyst design: A hydrogen evolution reaction case study

Lena Podina

Christina Humer

Alexandre AGM Duval

Victor Schmidt

Ali Ramlaoui

Shahana Chatterjee

David Rolnick

Félix Therrien

Efficient and inexpensive energy storage is essential for accelerating the adoption of renewable energy and ensuring a stable supply, despit… (voir plus)e fluctuations in sources such as wind and solar. Electrocatalysts play a key role in hydrogen energy storage (HES), allowing the energy to be stored as hydrogen. However, the development of affordable and high-performance catalysts for this process remains a significant challenge. We introduce Catalyst GFlowNet, a generative model that leverages machine learning-based predictors of formation and adsorption energy to design crystal surfaces that act as efficient catalysts. We demonstrate the performance of the model through a proof-of-concept application to the hydrogen evolution reaction, a key reaction in HES, for which we successfully identified platinum as the most efficient known catalyst. In future work, we aim to extend this approach to the oxygen evolution reaction, where current optimal catalysts are expensive metal oxides, and open the search space to discover new materials. This generative modeling framework offers a promising pathway for accelerating the search for novel and efficient catalysts.

2025-10-02

ArXiv (prépublication)

Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents

Abdessamad El-Kabid

Loubna Benabbou

Redouane Lguensat

Alex Hern'andez-Garc'ia

Accurate modeling of physical systems governed by partial differential equations is a central challenge in scientific computing. In oceanogr… (voir plus)aphy, high-resolution current data are critical for coastal management, environmental monitoring, and maritime safety. However, available satellite products, such as Copernicus data for sea water velocity at ~0.08 degrees spatial resolution and global ocean models, often lack the spatial granularity required for detailed local analyses. In this work, we (a) introduce a supervised deep learning framework based on neural operators for solving PDEs and providing arbitrary resolution solutions, and (b) propose downscaling models with an application to Copernicus ocean current data. Additionally, our method can model surrogate PDEs and predict solutions at arbitrary resolution, regardless of the input resolution. We evaluated our model on real-world Copernicus ocean current data and synthetic Navier-Stokes simulation datasets.

2025-07-24

ArXiv (prépublication)

Torsional-GFN: a conditional conformation generator for small molecules

Alexandra Volokhova

Lena Nehale Ezzine

Piotr Gai'nski

Luca Scimeca

Emmanuel Bengio

Prudencio Tossou

Generating stable molecular conformations is crucial in several drug discovery applications, such as estimating the binding affinity of a mo… (voir plus)lecule to a target. Recently, generative machine learning methods have emerged as a promising, more efficient method than molecular dynamics for sampling of conformations from the Boltzmann distribution. In this paper, we introduce Torsional-GFN, a conditional GFlowNet specifically designed to sample conformations of molecules proportionally to their Boltzmann distribution, using only a reward function as training signal. Conditioned on a molecular graph and its local structure (bond lengths and angles), Torsional-GFN samples rotations of its torsion angles. Our results demonstrate that Torsional-GFN is able to sample conformations approximately proportional to the Boltzmann distribution for multiple molecules with a single model, and allows for zero-shot generalization to unseen bond lengths and angles coming from the MD simulations for such molecules. Our work presents a promising avenue for scaling the proposed approach to larger molecular systems, achieving zero-shot generalization to unseen molecules, and including the generation of the local structure into the GFlowNet model.

2025-07-15

ArXiv (prépublication)

Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents

Abdessamad El-Kabid

Loubna Benabbou

Redouane Lguensat

Accurate modeling of physical systems governed by partial differential equations is a central challenge in scientific computing. In oceanogr… (voir plus)aphy, high-resolution current data are critical for coastal management, environmental monitoring, and maritime safety. However, available satellite products, such as Copernicus data for sea water velocity at ~0.08 degrees spatial resolution and global ocean models, often lack the spatial granularity required for detailed local analyses. In this work, we (a) introduce a supervised deep learning framework based on neural operators for solving PDEs and providing arbitrary resolution solutions, and (b) propose downscaling models with an application to Copernicus ocean current data. Additionally, our method can model surrogate PDEs and predict solutions at arbitrary resolution, regardless of the input resolution. We evaluated our model on real-world Copernicus ocean current data and synthetic Navier-Stokes simulation datasets.

2025-07-01

arXiv (publié)

RainShift: A Benchmark for Precipitation Downscaling Across Geographies

Paula Harder

Luca Schmidt

Francis Pelletier

Nicole Ludwig 0002

Matthew Chantry

Christian Lessig

David Rolnick

Earth System Models (ESM) are our main tool for projecting the impacts of climate change. However, running these models at sufficient resolu… (voir plus)tion for local-scale risk-assessments is not computationally feasible. Deep learning-based super-resolution models offer a promising solution to downscale ESM outputs to higher resolutions by learning from data. Yet, due to regional variations in climatic processes, these models typically require retraining for each geographical area-demanding high-resolution observational data, which is unevenly available across the globe. This highlights the need to assess how well these models generalize across geographic regions. To address this, we introduce RainShift, a dataset and benchmark for evaluating downscaling under geographic distribution shifts. We evaluate state-of-the-art downscaling approaches including GANs and diffusion models in generalizing across data gaps between the Global North and Global South. Our findings reveal substantial performance drops in out-of-distribution regions, depending on model and geographic area. While expanding the training domain generally improves generalization, it is insufficient to overcome shifts between geographically distinct regions. We show that addressing these shifts through, for example, data alignment can improve spatial generalization. Our work advances the global applicability of downscaling methods and represents a step toward reducing inequities in access to high-resolution climate information.

2025-07-01

arXiv (publié)

Torsional-GFN: a conditional conformation generator for small molecules

Alexandra Volokhova

Lena Nehale Ezzine

Piotr Gaiński

Luca Scimeca

Emmanuel Bengio

Prudencio Tossou

2025-06-11

ICML.cc/2025/Workshop/GenBio (poster)

openreview.net

Learning Decision Trees as Amortized Structure Inference

Mohammed Mahfoud

Ghait Boukachab

Michał Koziarski

Stefan Bauer

Nikolay Malkin

Building predictive models for tabular data presents fundamental challenges, notably in scaling consistently, i.e., more resources translati… (voir plus)ng to better performance, and generalizing systematically beyond the training data distribution. Designing decision tree models remains especially challenging given the intractably large search space, and most existing methods rely on greedy heuristics, while deep learning inductive biases expect a temporal or spatial structure not naturally present in tabular data. We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data, formulating decision tree construction as a sequential planning problem. We train a deep reinforcement learning (GFlowNet) policy to solve this problem, yielding a generative model that samples decision trees from the Bayesian posterior. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks derived from real-world data, robustness to distribution shifts, and anomaly detection, all while yielding interpretable models with shorter description lengths. Samples from the trained DT-GFN model can be ensembled to construct a random forest, and we further show that the performance of scales consistently in ensemble size, yielding ensembles of predictors that continue to generalize systematically.

2025-03-10

ArXiv (prépublication)

Learning Decision Trees as Amortized Structure Inference

Mohammed Mahfoud

Ghait Boukachab

Michał Koziarski

Stefan Bauer