Publications

Revisiting the 2023 wildfire season in Canada

Flavie Pelletier

Jeff Cardille

Michael A. Wulder

Joanne C. White

Txomin Hermosilla

2024-06-01

Science of Remote Sensing (publié)

doi.org

Revisiting the 2023 wildfire season in Canada

Flavie Pelletier

Jeff Cardille

Michael A. Wulder

Joanne C. White

Txomin Hermosilla

2024-06-01

Science of Remote Sensing (publié)

doi.org

RGFN: Synthesizable Molecular Generation Using GFlowNets

Michał Koziarski

Andrei Rekesh

Dmytro Shevchuk

Almer M. van der Sloot

Piotr Gainski

Yoshua Bengio

Cheng-Hao Liu

Mike Tyers

Robert A. Batey

Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional… (voir plus) in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.

2024-06-01

ArXiv (prépublication)

doi.org

arxiv.org

RGFN: Synthesizable Molecular Generation Using GFlowNets

Michał Koziarski

Andrei Rekesh

Dmytro Shevchuk

Almer M. van der Sloot

Piotr Gainski

Yoshua Bengio

Cheng-Hao Liu

Mike Tyers

Robert A. Batey

Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional… (voir plus) in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.

2024-06-01

ArXiv (prépublication)

doi.org

arxiv.org

RGFN: Synthesizable Molecular Generation Using GFlowNets

Michał Koziarski

Andrei Rekesh

Dmytro Shevchuk

Almer M. van der Sloot

Piotr Gainski

Yoshua Bengio

Cheng-Hao Liu

Mike Tyers

Robert A. Batey

Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional… (voir plus) in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.

2024-06-01

ArXiv (prépublication)

doi.org

arxiv.org

State Soup: In-Context Skill Learning, Retrieval and Mixing

Maciej Pi'oro

Maciej Wolczyk

Razvan Pascanu

Johannes Von Oswald

João Sacramento

A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems. Suc… (voir plus)h models naturally handle long sequences efficiently, as the cost of processing a new input is independent of sequence length. Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter interpolation. Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined, exploiting the linearity of recurrence. We study this form of fast model merging on Mamba-2.8b, a pretrained recurrent model, and present preliminary evidence that simple linear state interpolation methods suffice to improve next-token perplexity as well as downstream in-context learning task performance.

2024-06-01

arXiv (publié)

doi.org

arxiv.org

Transformers meet Neural Algorithmic Reasoners

Wilfried Bounsi

Borja Ibarz

Andrew Joseph Dudzik

Jessica B. Hamrick

Larisa Markeeva

Alex Vitvitskyi

Razvan Pascanu

Petar Veličković

Transformers have revolutionized machine learning with their simple yet effective architecture. Pre-training Transformers on massive text da… (voir plus)tasets from the Internet has led to unmatched generalization for natural language understanding (NLU) tasks. However, such language models remain fragile when tasked with algorithmic forms of reasoning, where computations must be precise and robust. To address this limitation, we propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs). Such NARs proved effective as generic solvers for algorithmic tasks, when specified in graph form. To make their embeddings accessible to a Transformer, we propose a hybrid architecture with a two-phase training procedure, allowing the tokens in the language model to cross-attend to the node embeddings from the NAR. We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning, both in and out of distribution.

2024-06-01

arXiv (publié)

doi.org

arxiv.org

Transformers need glasses! Information over-squashing in language tasks

Federico Barbero

Andrea Banino

Steven Kapturowski

Dharshan Kumaran

João Guilherme Madeira Araújo

Alex Vitvitskyi

Razvan Pascanu

Petar Veličković

We study how information propagates in decoder-only Transformers, which are the architectural backbone of most existing frontier large langu… (voir plus)age models (LLMs). We rely on a theoretical signal propagation analysis -- specifically, we analyse the representations of the last token in the final layer of the Transformer, as this is the representation used for next-token prediction. Our analysis reveals a representational collapse phenomenon: we prove that certain distinct sequences of inputs to the Transformer can yield arbitrarily close representations in the final token. This effect is exacerbated by the low-precision floating-point formats frequently used in modern LLMs. As a result, the model is provably unable to respond to these sequences in different ways -- leading to errors in, e.g., tasks involving counting or copying. Further, we show that decoder-only Transformer language models can lose sensitivity to specific tokens in the input, which relates to the well-known phenomenon of over-squashing in graph neural networks. We provide empirical evidence supporting our claims on contemporary LLMs. Our theory also points to simple solutions towards ameliorating these issues.

2024-06-01

arXiv (publié)

doi.org

arxiv.org

When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

Claas Voelcker

Tyler Kastner

Igor Gilitschenski

Amir-massoud Farahmand

We investigate the impact of auxiliary learning tasks such as observation reconstruction and latent self-prediction on the representation le… (voir plus)arning problem in reinforcement learning. We also study how they interact with distractions and observation functions in the MDP. We provide a theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning in the presence of distractions and observation functions under linear model assumptions. With this formalization, we are able to explain why latent-self prediction is a helpful \emph{auxiliary task}, while observation reconstruction can provide more useful features when used in isolation. Our empirical analysis shows that the insights obtained from our learning dynamics framework predicts the behavior of these loss functions beyond the linear model assumption in non-linear neural networks. This reinforces the usefulness of the linear model framework not only for theoretical analysis, but also practical benefit for applied problems.

2024-06-01

arXiv (publié)

doi.org

arxiv.org

Amortizing intractable inference in diffusion models for vision, language, and control

Moksh J. Jain

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-05-31

ArXiv (prépublication)

doi.org

arxiv.org

Amortizing intractable inference in diffusion models for vision, language, and control

Moksh J. Jain

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-05-31

ArXiv (prépublication)

doi.org

arxiv.org

Amortizing intractable inference in diffusion models for vision, language, and control

Moksh J. Jain

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-05-31

ArXiv (prépublication)

doi.org

arxiv.org

Conférence sur les politiques de l'IA de Mila

À l’avant-garde d’une nouvelle ère

TRAIL : IA responsable pour les professionnels et les leaders

Publications

Conférence sur les politiques de l'IA de Mila

À l’avant-garde d’une nouvelle ère

TRAIL : IA responsable pour les professionnels et les leaders

Mots-clés populaires:

Publications