Publications

Learning Optimizers for Local SGD

Charles-Étienne Joseph

Benjamin Thérien

Abhinav Moudgil

Boris Knyazev

Eugene Belilovsky

2023-10-27

NeurIPS.cc/2023/Workshop/Federated_Learning (poster)

openreview.net

Learning to Scale Logits for Temperature-Conditional GFlowNets

Minsu Kim

Joohwan Ko

Dinghuai Zhang

Ling Pan

Taeyoung Yun

Woo Chang Kim

Jinkyoo Park

Yoshua Bengio

GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular gr… (see more)aphs. They are trained with the objective of sampling such objects with probability proportional to the object's reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperature-conditional GFlowNets is the controllability of GFlowNets' exploration and exploitation through adjusting temperature. We propose a \textit{Learning to Scale Logits for temperature-conditional GFlowNets} (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Science (poster)

doi.org

openreview.net

Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Max Schwarzer

Jesse Farebrother

Joshua Greaves

Kevin Roccapriore

Ekin Dogus Cubuk

Rishabh Agarwal

Aaron Courville

Marc Gendron-Bellemare

Sergei Kalinin

Igor Mordatch

Pablo Samuel Castro

We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

openreview.net

Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Max Schwarzer

Jesse Farebrother

Joshua Greaves

Kevin Roccapriore

Ekin Dogus Cubuk

Rishabh Agarwal

Aaron Courville

Marc Gendron-Bellemare

Sergei Kalinin

Igor Mordatch

Pablo Samuel Castro

We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

openreview.net

Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

Cian Eastwood

Julius von Kügelgen

Linus Ericsson

Diane Bouchacourt

Pascal Vincent

Mark Ibrahim

Bernhard Schölkopf

Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, … (see more)with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and (multiple blocks of) style variables. We empirically demonstrate the benefits our approach on synthetic datasets and then present promising but limited results on ImageNet.

2023-10-27

NeurIPS.cc/2023/Workshop/CRL (poster)

openreview.net

SORBET: A Siamese Network for Ontology Embeddings Using a Distance-Based Regression Loss and BERT

Francis Gosselin

Amal Zouaq

2023-10-27

The Semantic Web – ISWC 2023 (published)

doi.org

On the importance of catalyst-adsorbate 3D interactions for relaxed energy predictions

Alvaro Carbonero

Alexandre AGM Duval

Victor Schmidt

Santiago Miret

Alex Hernandez-Garcia

Yoshua Bengio

David Rolnick

The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorpor… (see more)ate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g.~when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate with respect to the electro-catalyst. We consider SchNet, DimeNet++ and FAENet as base architectures and measure the impact of four modifications on model performance: removing edges in the input graph, pooling independent representations, not sharing the backbone weights and using an attention mechanism to propagate non-geometric relative information. We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE. Our work suggests future research directions in accelerated materials discovery where information on reactant configurations can be reduced or altogether omitted.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (poster)

doi.org

openreview.net

Towards equilibrium molecular conformation generation with GFlowNets

Alexandra Volokhova

Michał Koziarski

Alex Hernandez-Garcia

Cheng-Hao Liu

Santiago Miret

Pablo Lemos

Luca Thiede

Zichao Yan

Alán Aspuru-Guzik

Yoshua Bengio

Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this pa… (see more)per we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (poster)

doi.org

openreview.net

Interacting Diffusion Processes for Event Sequence Forecasting

Mai Zeng

Florence Regol

Mark Coates

Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time… (see more) intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. This allows us to fully leverage the high dimensional modeling capability of modern generative models. Our model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPP.

2023-10-26

ArXiv (preprint)

doi.org

arxiv.org

Surrogate Minimization: An Optimization Algorithm for Training Large Neural Networks with Model Parallelism

Reza Asad

Reza Babanezhad Harikandeh

Issam Hadj Laradji

Nicolas Le Roux

Sharan Vaswani

2023-10-26

NeurIPS.cc/2023/Workshop/OPT (poster)

openreview.net

The value of standards for health datasets in artificial intelligence-based applications

Anmol Arora

Joseph E. Alderman

Joanne Palmer

Shaswath Ganapathi

Elinor Laws

Melissa D. McCradden

Lauren Oakden-Rayner

Stephen R. Pfohl

Marzyeh Ghassemi

Francis McKay

Darren Treanor

Negar Rostamzadeh

Bilal Mateen

Jacqui Gath

Adewole O. Adebajo

Stephanie Kuku

Rubeta Matin

Katherine Heller

Elizabeth Sapey

Neil J. Sebire … (see 4 more)

Heather Cole-Lewis

Melanie Calvert

Alastair Denniston

Xiaoxuan Liu

2023-10-26

Nature Medicine (published)

doi.org

CL-MASR: A Continual Learning Benchmark for Multilingual ASR

Luca Della Libera

Pooneh Mousavi

Salah Zaiem

Cem (Yusuf) Subakan

Mirco Ravanelli

2023-10-25

ArXiv (preprint)

doi.org

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications