Publications

GFlowNet-EM for Learning Compositional Latent Variable Models

Edward J. Hu

Alexandros Graikos

Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large nu… (see more)mber of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational inference algorithms for complex distributions over discrete structures. Our approach, GFlowNet-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-context-free grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder.

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining

Shengchao Liu

Weitao Du

Zhiming Ma

Hongyu Guo

Jian Tang

Molecule pretraining has quickly become the go-to schema to boost the performance of AI-based drug discovery. Naturally, molecules can be re… (see more)presented as 2D topological graphs or 3D geometric point clouds. Although most existing pertaining methods focus on merely the single modality, recent research has shown that maximizing the mutual information (MI) between such two modalities enhances the molecule representation ability. Meanwhile, existing molecule multi-modal pretraining approaches approximate MI based on the representation space encoded from the topology and geometry, thus resulting in the loss of critical structural information of molecules. To address this issue, we propose MoleculeSDE. MoleculeSDE leverages group symmetric (e.g., SE(3)-equivariant and reflection-antisymmetric) stochastic differential equation models to generate the 3D geometries from 2D topologies, and vice versa, directly in the input space. It not only obtains tighter MI bound but also enables prosperous downstream tasks than the previous work. By comparing with 17 pretraining baselines, we empirically verify that MoleculeSDE can learn an expressive representation with state-of-the-art performance on 26 out of 32 downstream tasks.

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Guided-topic modelling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes

Lakshmipuram Seshadri Swapna

Michael Huang

Yuemei Li

Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infe… (see more)r cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated in deconvolving disease transcriptomes, GTM-decon can infer multiple cell-type-specific gene topic distributions per cell type, which captures sub-cell-type variations. GTM-decon can also use phenotype labels from single-cell or bulk data as a guide to infer phenotype-specific gene distributions. In a nested-guided design, GTM-decon identified cell-type-specific differentially expressed genes from bulk breast cancer transcriptomes.

2023-07-02

bioRxiv (preprint)

doi.org

Hidden Symmetries of ReLU Networks

J. Elisenda Grigsby

Kathryn Lindsey

David Rolnick

The parameter space for any fixed architecture of feedforward ReLU neural networks serves as a proxy during training for the associated clas… (see more)s of functions - but how faithful is this representation? It is known that many different parameter settings can determine the same function. Moreover, the degree of this redundancy is inhomogeneous: for some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. We also describe a number of mechanisms through which hidden symmetries can arise, and empirically approximate the functional dimension of different network architectures at initialization. These experiments indicate that the probability that a network has no hidden symmetries decreases towards 0 as depth increases, while increasing towards 1 as width and input dimension increase.

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

Abdurakhmon Sadiev

Marina Danilova

Eduard Gorbunov

Samuel Horváth

Gauthier Gidel

Pavel Dvurechensky

Alexander Gasnikov

Peter Richtárik

During the recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimiza… (see more)tion methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as boundedness of the gradient noise variance or of the objective’s gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Maximal Initial Learning Rates in Deep ReLU Networks

Gaurav Iyer

Boris Hanin

David Rolnick

Training a neural network requires choosing a suitable learning rate, which involves a trade-off between speed and effectiveness of converge… (see more)nce. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Mechanistic Mode Connectivity

Ekdeep Singh Lubana

Eric J Bigelow

Robert P. Dick

David M. Krueger

Hidenori Tanaka

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Multi-Objective GFlowNets

Moksh Jain

Sharath Chandra Raparthy

Alex Hernández-García

Jarrid Rector-Brooks

Yoshua Bengio

Santiago Miret

Emmanuel Bengio

We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learni… (see more)ng such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. Moreover, these objectives are often imperfect evaluations of some underlying property of interest, making it important to generate diverse candidates to have multiple options for expensive downstream evaluations. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets. We introduce two variants of MOGFNs: MOGFN-PC, which models a family of independent sub-problems defined by a scalarization function, with reward-conditional GFlowNets, and MOGFN-AL, which solves a sequence of sub-problems defined by an acquisition function in an active learning loop. Our experiments on wide variety of synthetic and benchmark tasks demonstrate advantages of the proposed methods in terms of the Pareto performance and importantly, improved candidate diversity, which is the main contribution of this work.

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Neural FIM for learning Fisher information metrics from point cloud data

Oluwadamilola Fasina

Guillaume Huguet

Alexander Tong

Yanlei Zhang

Guy Wolf

Maximilian Nickel

Ian Adelstein

Smita Krishnaswamy

Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the under… (see more)lying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM's utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

PAC-Bayesian Generalization Bounds for Adversarial Generative Models

Sokhna Diarra Mbacke

Florence Clerc

Pascal Germain

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design

Chuan Guo

Kamalika Chaudhuri

Pierre Stock

Michael G. Rabbat

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

proceedings.mlr.press

ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts

Minghao Xu

Xinyu Yuan

Santiago Miret

Jian Tang

Current protein language models (PLMs) learn protein representations mainly based on their sequences, thereby well capturing co-evolutionary… (see more) information, but they are unable to explicitly acquire protein functions, which is the end goal of protein representation learning. Fortunately, for many proteins, their textual property descriptions are available, where their various functions are also described. Motivated by this fact, we first build the ProtDescribe dataset to augment protein sequences with text descriptions of their functions and other important properties. Based on this dataset, we propose the ProtST framework to enhance Protein Sequence pre-training and understanding by biomedical Texts. During pre-training, we design three types of tasks, i.e., unimodal mask prediction, multimodal representation alignment and multimodal mask prediction, to enhance a PLM with protein property information with different granularities and, at the same time, preserve the PLM's original representation power. On downstream tasks, ProtST enables both supervised learning and zero-shot prediction. We verify the superiority of ProtST-induced PLMs over previous ones on diverse representation learning benchmarks. Under the zero-shot setting, we show the effectiveness of ProtST on zero-shot protein classification, and ProtST also enables functional protein retrieval from a large-scale database without any function annotation.

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Publications

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Popular keywords:

Publications