Publications

Amortizing intractable inference in diffusion models for vision, language, and control
Siddarth Venkatraman
Moksh J. Jain
Luca Scimeca
Minsu Kim
Marcin Sendera
Mohsin Hasan
Luke Rowe
Sarthak Mittal
Pablo Lemos
Emmanuel Bengio
Alexandre Adam
Jarrid Rector-Brooks
Nikolay Malkin
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (see more)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,
Balancing Context Length and Mixing Times for Reinforcement Learning at Scale
Matthew D Riemer
Janarthanan Rajendran
Cell ontology guided transcriptome foundation model
Xinyu Yuan
Zhihao Zhan
Zuobai Zhang
Manqi Zhou
Jianan Zhao
Boyu Han
Transcriptome foundation models (TFMs) hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by… (see more) self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present **s**ingle **c**ell, **Cell**-**o**ntology guided TFM (scCello). We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses. Source code and model weights are available at https://github.com/DeepGraphLearning/scCello.
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
Hao Tang
Keya Hu
Jin Peng Zhou
Si Cheng Zhong
Wei-Long Zheng
Kevin Ellis
Conformal Inverse Optimization
Bo Lin
Timothy Chan
Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that… (see more) such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.
Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval
Haolun Wu
Ofer Meshi
Masrour Zoghi
Craig Boutilier
MARYAM KARIMZADEHGAN
Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers
Jonas Ngnawe
Sabyasachi Sahoo
Yann Batiste Pequignot
Frederic Precioso
Efficient Adversarial Training in LLMs with Continuous Attacks
Sophie Xhonneux
Stephan Günnemann
Leo Schwinn
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial tra… (see more)ining has proven to be one of the most promising methods to reliably improve robustness against such attacks. Yet, in the context of LLMs, current methods for adversarial training are hindered by the high computational costs required to perform discrete adversarial attacks at each training iteration. We address this problem by instead calculating adversarial attacks in the continuous embedding space of the LLM, which is orders of magnitudes more efficient. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses: the first makes the model robust on continuous embedding attacks computed on an adversarial behaviour dataset; the second ensures the usefulness of the final model by fine-tuning on utility data. Moreover, we introduce C-AdvIPO, an adversarial variant of IPO that does not require utility data for adversarially robust alignment. Our empirical evaluation on five models from different families (Gemma, Phi3, Mistral, Zephyr, Llama2) and at different scales (2B, 3.8B, 7B) shows that both algorithms substantially enhance LLM robustness against discrete attacks (GCG, AutoDAN, PAIR), while maintaining utility. Our results demonstrate that robustness to continuous perturbations can extrapolate to discrete threat models. Thereby, we present a path toward scalable adversarial training algorithms for robustly aligning LLMs.
Efficient Leverage Score Sampling for Tensor Train Decomposition
Vivek Bharadwaj
Beheshteh T. Rakhshan
Osman Asif Malik
Tensor Train~(TT) decomposition is widely used in the machine learning and quantum physics communities as a popular tool to efficiently comp… (see more)ress high-dimensional tensor data. In this paper, we propose an efficient algorithm to accelerate computing the TT decomposition with the Alternating Least Squares (ALS) algorithm relying on exact leverage scores sampling. For this purpose, we propose a data structure that allows us to efficiently sample from the tensor with time complexity logarithmic in the product of the tensor dimensions. Our contribution specifically leverages the canonical form of the TT decomposition. By maintaining the canonical form through each iteration of ALS, we can efficiently compute (and sample from) the leverage scores, thus achieving significant speed-up in solving each sketched least-square problem. Experiments on synthetic and real data on dense and sparse tensors demonstrate that our method outperforms SVD-based and ALS-based algorithms.
Efficient Reinforcement Learning by Discovering Neural Pathways
Samin Yeasar Arnob
Riyasat Ohib
Sergey Plis
Amy Zhang
Reinforcement learning (RL) algorithms have been very successful at tackling complex control problems, such as AlphaGo or fusion control. Ho… (see more)wever, current research mainly emphasizes solution quality, often achieved by using large models trained on large amounts of data, and does not account for the financial, environmental, and societal costs associated with developing and deploying such models. Modern neural networks are often overparameterized and a significant number of parameters can be pruned without meaningful loss in performance, resulting in more efficient use of the model's capacity lottery ticket. We present a methodology for identifying sub-networks within a larger network in reinforcement learning (RL). We call such sub-networks, neural pathways. We show empirically that even very small learned sub-networks, using less than 5% of the large network's parameters, can provide very good quality solutions. We also demonstrate the training of multiple pathways within the same networks in a multitask setup, where each pathway is encouraged to tackle a separate task. We evaluate empirically our approach on several continuous control tasks, in both online and offline training
ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation
Majdi Hassan
Nikhil Shenoy
Jungyoon Lee
Hannes Stärk
Stephan Thaler
Predicting low-energy molecular conformations given a molecular graph is an important but challenging task in computational drug discovery.… (see more) Existing state- of-the-art approaches either resort to large scale transformer-based models that diffuse over conformer fields, or use computationally expensive methods to gen- erate initial structures and diffuse over torsion angles. In this work, we introduce Equivariant Transformer Flow (ET-Flow). We showcase that a well-designed flow matching approach with equivariance and harmonic prior alleviates the need for complex internal geometry calculations and large architectures, contrary to the prevailing methods in the field. Our approach results in a straightforward and scalable method that directly operates on all-atom coordinates with minimal assumptions. With the advantages of equivariance and flow matching, ET-Flow significantly increases the precision and physical validity of the generated con- formers, while being a lighter model and faster at inference. Code is available https://github.com/shenoynikhil/ETFlow.
A Foundation Model for Zero-shot Logical Query Reasoning
Mikhail Galkin
Jincheng Zhou
Bruno Ribeiro
Zhaocheng Zhu
Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional querie… (see more)s comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, the first foundation model for inductive reasoning that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG completion model, UltraQuery can solve CLQA on any KG after finetuning on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 15 of them.