Publications

GFlowNets for AI-Driven Scientific Discovery

Jason Hartford

Cheng-Hao Liu

Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the p… (voir plus)ace of scientific discovery. While science has traditionally relied on trial and error and even serendipity to a large extent, the last few decades have seen a surge of data-driven scientific discoveries. However, in order to truly leverage large-scale data sets and high-throughput experimental setups, machine learning methods will need to be further improved and better integrated in the scientific discovery pipeline. A key challenge for current machine learning methods in this context is the efficient exploration of very large search spaces, which requires techniques for estimating reducible (epistemic) uncertainty and generating sets of diverse and informative experiments to perform. This motivated a new probabilistic machine learning framework called GFlowNets, which can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. GFlowNets learn to sample from a distribution given indirectly by a reward function corresponding to an unnormalized probability, which enables sampling diverse, high-reward candidates. GFlowNets can also be used to form efficient and amortized Bayesian posterior estimators for causal models conditioned on the already acquired experimental data. Having such posterior models can then provide estimators of epistemic uncertainty and information gain that can drive an experimental design policy. Altogether, here we will argue that GFlowNets can become a valuable tool for AI-driven scientific discovery, especially in scenarios of very large candidate spaces where we have access to cheap but inaccurate measurements or to expensive but accurate measurements. This is a common setting in the context of drug and material discovery, which we use as examples throughout the paper.

2022-12-31

Digital Discovery (publié)

doi.org

arxiv.org

GFlowOut: Dropout with Generative Flow Networks

Dianbo Liu

Moksh Jain

Bonaventure F. P. Dossou

Qianli Shen

Chris C. Emezue

Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and general… (voir plus)ization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks.

2022-12-31

ICML (publié)

doi.org

proceedings.mlr.press

GitHub Copilot AI pair programmer: Asset or Liability?

Arghavan Moradi Dakhel

Vahid Majdinasab

Amin Nikanjam

Foutse Khomh

Michel C. Desmarais

Z. Jiang

Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called… (voir plus) Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it effectively. In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot's proposed solutions with those of human programmers on a set of programming tasks. For the former, we assess the performance and functionality of Copilot in solving selected fundamental problems in computer science, like sorting and implementing data structures. In the latter, a dataset of programming problems with human-provided solutions is used. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems, however, some solutions are buggy and non-reproducible. Moreover, Copilot has some difficulties in combining multiple methods to generate a solution. Comparing Copilot to humans, our results show that the correct ratio of humans' solutions is greater than Copilot's suggestions, while the buggy solutions generated by Copilot require less effort to be repaired.

2022-12-31

J. Syst. Softw. (publié)

doi.org

arxiv.org

GOKU-UI: Ubiquitous Inference through Attention and Multiple Shooting for Continuous-time Generative Models

Germán Abrevaya

Mahta Ramezanian-Panahi

Jean-christophe Gagnon-audet

Irina Rish

Pablo Polosecki

Silvina Ponce Dawson

Guillermo Cecchi

Guillaume Dumas

Scientiﬁc Machine Learning (SciML) is a burgeoning ﬁeld that synergistically combines domain-aware and interpretable models with agnosti… (voir plus)c machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. The GOKU-UI broadens the original model’s spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), and integrates a distributed, i.e. ubiquitous, inference through attention mechanisms and a novel multiple shooting training strategy in the latent space. These enhancements have led to a signiﬁcant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Speciﬁcally, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 32-fold smaller, underscoring its remarkable data efﬁciency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau

2022-12-31

arXiv.org (prépublication)

doi.org

Gradient Masked Averaging for Federated Learning

Irene Tenison

Sai Aravind Sreeramadas

Vaikkunth Mugunthan

Edouard Oyallon

Eugene Belilovsky

Irina Rish

Federated learning (FL) is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a u… (voir plus)nified global model without the need to share data amongst each other. A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms. Standard FL algorithms involve averaging of model parameters or gradient updates to approximate the global model at the server. However, we argue that in heterogeneous settings, averaging can result in information loss and lead to poor generalization due to the bias induced by dominant client gradients. We hypothesize that to generalize better across non-i.i.d datasets, the algorithms should focus on learning the invariant mechanism that is constant while ignoring spurious mechanisms that differ across clients. Inspired from recent works in Out-of-Distribution generalization, we propose a gradient masked averaging approach for FL as an alternative to the standard averaging of client updates. This aggregation technique for client updates can be adapted as a drop-in replacement in most existing federated algorithms. We perform extensive experiments on multiple FL algorithms with in-distribution, real-world, feature-skewed out-of-distribution, and quantity imbalanced datasets and show that it provides consistent improvements, particularly in the case of heterogeneous clients.

2022-12-31

Trans. Mach. Learn. Res. (publié)

openreview.net

Grammar Generative Models for Music Notation

Cheng-Zhi Anna Huang

Deep generative models have been successfully applied in many learning experiments with digital data, such as images or audio. In the field … (voir plus)of music, they can also be used to generate symbolic representations, in the context of problems such as automatic music generation or transcription [1-3]. A significant challenge for generating structured symbolic data in general is obtaining well-formed results. This is especially true in the case of music. It is indeed widely accepted that musical notation represents, well beyond simple sequences of notes, a hierarchical organization of melodic and harmonic information, inducing non-local dependencies between musical objects [4]. A good representation of this information is essential for the interpretation and analysis of music pieces.

2022-12-31

(publié)

www.semanticscholar.org

Graph Inductive Biases in Transformers without Message Passing

Liheng Ma

Chen Lin

Derek Lim

Adriana Romero

Puneet K. Dokania

Mark J. Coates

Philip Torr

Ser-Nam Lim

Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial fo… (voir plus)r Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) — a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive — it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.

2022-12-31

ICML (publié)

doi.org

proceedings.mlr.press

GROOD: Gradient-Aware Out-of-Distribution Detection

Mostafa Elaraby

Sabyasachi Sahoo

Yann Batiste Pequignot

Paul Novello

Liam Paull

2022-12-31

arXiv.org (prépublication)

doi.org

openreview.net

Guessing Random Additive Noise Decoding

Syed Mohsin Abbas

Marwan Jalaleddine

Warren J. Gross

2022-12-31

(publié)

doi.org

Guiding Language Model Math Reasoning with Planning Tokens

Xinyi Wang

Lucas Caccia

Oleksiy Ostapenko

Xingdi Yuan

William Yang Wang

Alessandro Sordoni

Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as cha… (voir plus)in-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce planning tokens at the start of each reasoning step, serving as a guide for the model, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. standard fine-tuning baselines.

2022-12-31

arXiv.org (prépublication)

doi.org

arxiv.org

GUILGET: GUI Layout GEneration with Transformer

Andrey Sobolevsky

Guillaume-Alexandre Bilodeau

Jinghui Cheng

Jin L.C. Guo

2022-12-31

Canadian AI (publié)

doi.org

arxiv.org

Guillotine Regularization: Why removing layers is needed to improve generalization in Self-Supervised Learning

Florian Bordes

Randall Balestriero

Quentin Garrido

Adrien Bardes

P Vincent

One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method,… (voir plus) and using this network on downstream tasks but with its last few projector layers entirely removed. This trick of throwing away the projector is actually critical for SSL methods to display competitive performances on ImageNet for which more than 30 percentage points can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last projector layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable method that has been used to improve generalization performance in transfer learning scenarios. In this work, we identify the underlying reasons behind its success and show that the optimal layer to use might change significantly depending on the training setup, the data or the downstream task. Lastly, we give some insights on how to reduce the need for a projector in SSL by aligning the pretext SSL task and the downstream task.

2022-12-31

Trans. Mach. Learn. Res. (publié)

openreview.net

Mila Techaide 2026

Propulsion d'entrepreneurs scientifiques

Avantage IA : productivité dans la fonction publique

Publications

Mila Techaide 2026

Propulsion d'entrepreneurs scientifiques

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Publications