Publications

Interventional Causal Representation Learning
Kartik Ahuja
Yixin Wang
Divyat Mahajan
Causal representation learning seeks to extract high-level latent factors from low-level sensory data. Most existing methods rely on observa… (see more)tional data and structural assumptions (e.g., conditional independence) to identify the latent factors. However, interventional data is prevalent across applications. Can interventional data facilitate causal representation learning? We explore this question in this paper. The key observation is that interventional data often carries geometric signatures of the latent factors' support (i.e. what values each latent can possibly take). For example, when the latent factors are causally connected, interventions can break the dependency between the intervened latents' support and their ancestors'. Leveraging this fact, we prove that the latent causal factors can be identified up to permutation and scaling given data from perfect
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
Sai Rajeswar
Pietro Mazzaglia
Tim Verbelen
Alexandre Piché
Bart Dhoedt
Alexandre Lacoste
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
Sai Rajeswar
Pietro Mazzaglia
Tim Verbelen
Alexandre Piché
Bart Dhoedt
Alexandre Lacoste
Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require la… (see more)rge amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies can improve generalization capabilities is still unclear, especially in visual control settings. In this work, we study the URLB and propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent, and a task-aware fine-tuning strategy combined with a new proposed hybrid planner, Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains 93.59% overall normalized performance, surpassing previous baselines by a staggering margin. The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models. We also show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation. Project website: https://masteringurlb.github.io/
Maximal Initial Learning Rates in Deep ReLU Networks
Gaurav Iyer
Boris Hanin
Training a neural network requires choosing a suitable learning rate, which involves a trade-off between speed and effectiveness of converge… (see more)nce. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate
Neural FIM for learning Fisher Information Metrics from point cloud data
Oluwadamilola Fasina
Guillaume Huguet
Alexander Tong
Yanlei Zhang
Maximilian Nickel
Ian Adelstein
Smita Krishnaswamy
Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the under… (see more)lying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM's utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).
ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts
Minghao Xu
Xinyu Yuan
Santiago Miret
Current protein language models (PLMs) learn protein representations mainly based on their sequences, thereby well capturing co-evolutionary… (see more) information, but they are unable to explicitly acquire protein functions, which is the end goal of protein representation learning. Fortunately, for many proteins, their textual property descriptions are available, where their various functions are also described. Motivated by this fact, we first build the ProtDescribe dataset to augment protein sequences with text descriptions of their functions and other important properties. Based on this dataset, we propose the ProtST framework to enhance Protein Sequence pre-training and understanding by biomedical Texts. During pre-training, we design three types of tasks, i.e., unimodal mask prediction, multimodal representation alignment and multimodal mask prediction, to enhance a PLM with protein property information with different granularities and, at the same time, preserve the PLM's original representation power. On downstream tasks, ProtST enables both supervised learning and zero-shot prediction. We verify the superiority of ProtST-induced PLMs over previous ones on diverse representation learning benchmarks. Under the zero-shot setting, we show the effectiveness of ProtST on zero-shot protein classification, and ProtST also enables functional protein retrieval from a large-scale database without any function annotation.
Robust Perception through Equivariance
Chengzhi Mao
Lingyu Zhang
Abhishek Vaibhav Joshi
Junfeng Yang
Hao Wang
Carl Vondrick
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
Daniel D. Johnson
Danny Tarlow
Christian Walder
Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinati… (see more)ons in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE), an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user. Our technique combines minimum-Bayes-risk decoding, dual decomposition, and decision diagrams in order to efficiently produce structured uncertainty summaries, given only sample access to an arbitrary generative model of code and an optional AST parser. We demonstrate R-U-SURE on three developer-assistance tasks, and show that it can be applied different user interaction patterns without retraining the model and leads to more accurate uncertainty estimates than token-probability baselines. We also release our implementation as an open-source library at https://github.com/google-research/r_u_sure.
Sampling-Based Accuracy Testing of Posterior Estimators for General Inference
Target-based Surrogates for Stochastic Optimization
Jonathan Wilder Lavington
Sharan Vaswani
Reza Babanezhad Harikandeh
Mark Schmidt
We consider minimizing functions for which it is expensive to compute the gradient. Such functions are prevalent in reinforcement learning, … (see more)imitation learning and bilevel optimization. Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e.g. the logits output by a linear model for classification) that can be minimized efficiently. This allows for multiple parameter updates to the model, amortizing the cost of gradient computation. In the full-batch setting, we prove that our surrogate is a global upper-bound on the loss, and can be (locally) minimized using a black-box optimization algorithm. We prove that the resulting majorization-minimization algorithm ensures convergence to a stationary point of the loss. Next, we instantiate our framework in the stochastic setting and propose the
Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features
Aleksandr Beznosikov
David Dobre
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints that arise in machine learni… (see more)ng applications. In recent years, stochastic versions of FW have gained popularity, motivated by large datasets for which the computation of the full gradient is prohibitively expensive. In this paper, we present two new variants of the FW algorithms for stochastic finite-sum minimization. Our algorithms have the best convergence guarantees of existing stochastic FW approaches for both convex and non-convex objective functions. Our methods do not have the issue of permanently collecting large batches, which is common to many stochastic projection-free approaches. Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems. The faster theoretical rates of our approaches are confirmed experimentally.
Environmental Scan of Existing Digital Health Solutions for Older Adults Living with Neurocognitive Disorders (Mild and Major) and Their Informal Caregivers: Summary Report
Ambily Jose
Maxime Sasseville
Ellen Gorus
Anik Giguère
Anne Bourbonnais
Ronald Buyl
Marie-Pierre Gagnon
: Digital health has added numerous promising solutions to enhance the health and wellness of people living with dementia and other cognitiv… (see more)e problems and their informal caregivers. This work aims to summarize currently available digital health solutions and their related characteristics to develop a decision support tool for older adults living with mild or major neurocognitive disorders and their informal caregivers. We conducted an environmental scan to identify digital health solutions from a systematic review and targeted searches for grey literature covering the regions of Canada and Europe. Technological tools were scanned based on a preformatted extraction grid. We assessed their relevance based on selected attributes. We identified 100 available digital health solutions. The majority (56%) were not specific to dementia. Only 28% provided scientific evidence of their effectiveness. Remote patient care, movement tracking and cognitive exercises were the most common purposes of digital health solutions. Most solutions were presented as mobility aid tools, pill dispensers, apps, web, or a combination of these platforms. This knowledge will inform the development of a decision support tool to assist older adults and their informal caregivers in their search for adequate eHealth solutions according to their needs and preferences, based on trustable information.