Publications

Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations
Cian Eastwood
Julius von Kügelgen
Linus Ericsson
Diane Bouchacourt
Mark Ibrahim
Bernhard Schölkopf
Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, … (voir plus)with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and (multiple blocks of) style variables. We empirically demonstrate the benefits our approach on synthetic datasets and then present promising but limited results on ImageNet.
SORBET: A Siamese Network for Ontology Embeddings Using a Distance-Based Regression Loss and BERT
Francis Gosselin
On the importance of catalyst-adsorbate 3D interactions for relaxed energy predictions
Alvaro Carbonero
Alexandre AGM Duval
Victor Schmidt
Santiago Miret
Alex Hernandez-Garcia
The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorpor… (voir plus)ate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g.~when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate with respect to the electro-catalyst. We consider SchNet, DimeNet++ and FAENet as base architectures and measure the impact of four modifications on model performance: removing edges in the input graph, pooling independent representations, not sharing the backbone weights and using an attention mechanism to propagate non-geometric relative information. We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE. Our work suggests future research directions in accelerated materials discovery where information on reactant configurations can be reduced or altogether omitted.
Towards equilibrium molecular conformation generation with GFlowNets
Alexandra Volokhova
Michał Koziarski
Alex Hernandez-Garcia
Cheng-Hao Liu
Santiago Miret
Pablo Lemos
Luca Thiede
Zichao Yan
Alán Aspuru-Guzik
Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this pa… (voir plus)per we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.
Interacting Diffusion Processes for Event Sequence Forecasting
Mai Zeng
Florence Regol
Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time… (voir plus) intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. This allows us to fully leverage the high dimensional modeling capability of modern generative models. Our model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPP.
Surrogate Minimization: An Optimization Algorithm for Training Large Neural Networks with Model Parallelism
Reza Asad
Reza Babanezhad Harikandeh
Issam Hadj Laradji
Sharan Vaswani
The value of standards for health datasets in artificial intelligence-based applications
Anmol Arora
Joseph E. Alderman
Joanne Palmer
Shaswath Ganapathi
Elinor Laws
Melissa D. McCradden
Lauren Oakden-Rayner
Stephen R. Pfohl
Marzyeh Ghassemi
Francis McKay
Darren Treanor
Bilal Mateen
Jacqui Gath
Adewole O. Adebajo
Stephanie Kuku
Rubeta Matin
Katherine Heller
Elizabeth Sapey
Neil J. Sebire … (voir 4 de plus)
Heather Cole-Lewis
Melanie Calvert
Alastair Denniston
Xiaoxuan Liu
CL-MASR: A Continual Learning Benchmark for Multilingual ASR
Luca Della Libera
Pooneh Mousavi
Salah Zaiem
PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Chuanrui Wang
Bozitao Zhong
Zuobai Zhang
Narendra Chaudhary
Sanchit Misra
Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a univers… (voir plus)ally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the
PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Chuanrui Wang
Bozitao Zhong
Zuobai Zhang
Narendra Chaudhary
Sanchit Misra
Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a univers… (voir plus)ally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the
Role of Structural and Conformational Diversity for Machine Learning Potentials
Nikhil Shenoy
Prudencio Tossou
Emmanuel Noutahi
Hadrien Mary
Jiarui Ding
In the field of Machine Learning Interatomic Potentials (MLIPs), understanding the intricate relationship between data biases, specifically … (voir plus)conformational and structural diversity, and model generalization is critical in improving the quality of Quantum Mechanics (QM) data generation efforts. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal nuanced patterns in generalization metrics. Notably, for optimal structural and conformational generalization, a careful balance between structural and conformational diversity is required, but existing QM datasets do not meet that trade-off. Additionally, our results highlight the limitation of the MLIP models at generalizing beyond their training distribution, emphasizing the importance of defining applicability domain during model deployment. These findings provide valuable insights and guidelines for QM data generation efforts.
Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms
Michael Perlmutter
Alexander Tong
Feng Gao
Matthew Hirn
The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. R… (voir plus)ecently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.