Publications

Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
Philip S. Thomas
Romain Laroche
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the … (see more)scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm’s user for handling the trade-offs for different reward signals while ensuring that the new policy performs at least as well as the baseline policy along each individual objective. We build on traditional SPI algorithms and propose a novel method based on Safe Policy Iteration with Baseline Bootstrapping (SPIBB, Laroche et al., 2019) that provides high probability guarantees on the performance of the agent in the true environment. We show the effectiveness of our method on a synthetic grid-world safety task as well as in a real-world critical care context to learn a policy for the administration of IV fluids and vasopressors to treat sepsis.
Neural Production Systems
Nan Rosemary Ke
Charles Blundell
Philippe Beaudoin
Nicolas Heess
Michael Mozer
Visual environments are structured, consisting of distinct objects or entities. These entities have properties -- both visible and latent --… (see more) that determine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph neural nets (GNNs) are used, but these are not particularly well suited to the task for two reasons. First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be. Second, GNNs do not factorize knowledge about interactions in an entity-conditional manner. As an alternative, we take inspiration from cognitive science and resurrect a classic approach, production systems, which consist of a set of rule templates that are applied by binding placeholder variables in the rules to specific entities. Rules are scored on their match to entities, and the best fitting rules are applied to update entity properties. In a series of experiments, we demonstrate that this architecture achieves a flexible, dynamic flow of control and serves to factorize entity-specific and rule-based information. This disentangling of knowledge achieves robust future-state prediction in rich visual environments, outperforming state-of-the-art methods using GNNs, and allows for the extrapolation from simple (few object) environments to more complex environments.
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity
Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and … (see more)the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.
Techniques for Symbol Grounding with SATNet
Sever Topan
Many experts argue that the future of artificial intelligence is limited by the field's ability to integrate symbolic logical reasoning into… (see more) deep learning architectures. The recently proposed differentiable MAXSAT solver, SATNet, was a breakthrough in its capacity to integrate with a traditional neural network and solve visual reasoning problems. For instance, it can learn the rules of Sudoku purely from image examples. Despite its success, SATNet was shown to succumb to a key challenge in neurosymbolic systems known as the Symbol Grounding Problem: the inability to map visual inputs to symbolic variables without explicit supervision ("label leakage"). In this work, we present a self-supervised pre-training pipeline that enables SATNet to overcome this limitation, thus broadening the class of problems that SATNet architectures can solve to include datasets where no intermediary labels are available at all. We demonstrate that our method allows SATNet to attain full accuracy even with a harder problem setup that prevents any label leakage. We additionally introduce a proofreading method that further improves the performance of SATNet architectures, beating the state-of-the-art on Visual Sudoku.
The Causal-Neural Connection: Expressiveness, Learnability, and Inference
Kevin Xia
Kai-Zhan Lee
Elias Bareinboim
One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mech… (see more)anisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.
The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning
Patrick Mineault
Tim Lillicrap
Christopher C. Pack
Blake A. Richards
The visual system of mammals is comprised of parallel, hierarchical specialized pathways. Different pathways are specialized in so far as th… (see more)ey use representations that are more suitable for supporting specific downstream behaviours. In particular, the clearest example is the specialization of the ventral (“what”) and dorsal (“where”) pathways of the visual cortex. These two pathways support behaviours related to visual recognition and movement, respectively. To-date, deep neural networks have mostly been used as models of the ventral, recognition pathway. However, it is unknown whether both pathways can be modelled with a single deep ANN. Here, we ask whether a single model with a single loss function can capture the properties of both the ventral and the dorsal pathways. We explore this question using data from mice, who like other mammals, have specialized pathways that appear to support recognition and movement behaviours. We show that when we train a deep neural network architecture with two parallel pathways using a self-supervised predictive loss function, we can outperform other models in fitting mouse visual cortex. Moreover, we can model both the dorsal and ventral pathways. These results demonstrate that a self-supervised predictive learning approach applied to parallel pathway architectures can account for some of the functional specialization seen in mammalian visual systems.
Problèmes associés au déploiement des modèles fondés sur l’apprentissage machine en santé
Tianshi Cao
Joseph D Viviano
Michael Fralick
Marzyeh Ghassemi
Muhammad Mamdani
Russell Greiner
Learned Image Compression for Machine Perception
Recent work has shown that learned image compression strategies can outperform standard hand-crafted compression algorithms that have been d… (see more)eveloped over decades of intensive research on the rate-distortion trade-off. With growing applications of computer vision, high quality image reconstruction from a compressible representation is often a secondary objective. Compression that ensures high accuracy on computer vision tasks such as image segmentation, classification, and detection therefore has the potential for significant impact across a wide variety of settings. In this work, we develop a framework that produces a compression format suitable for both human perception and machine perception. We show that representations can be learned that simultaneously optimize for compression and performance on core vision tasks. Our approach allows models to be trained directly from compressed representations, and this approach yields increased performance on new tasks and in low-shot learning settings. We present results that improve upon segmentation and detection performance compared to standard high quality JPGs, but with representations that are four to ten times smaller in terms of bits per pixel. Further, unlike naive compression methods, at a level ten times smaller than standard JEPGs, segmentation and detection models trained from our format suffer only minor degradation in performance.
Vesicular trafficking is a key determinant of the statin response in acute myeloid leukemia
Jana Krosl
Marie-Eve Bordeleau
Céline Moison
Tara MacRae
Isabel Boivin
Nadine Mayotte
Deanne Gracias
Irène Baccelli
Vincent-Philippe Lavallee
Richard Bisaillon
Bernhard Lehnertz
Rodrigo Mendoza-Sanchez
Réjean Ruel
Thierry Bertomeu
Jasmin Coulombe-Huntington
Geneviève Boucher
Nandita Noronha
Caroline Pabst
Mike Tyers
Patrick Gendron … (see 5 more)
S. Lemieux
Frederic Barabe
Anne Marinier
Josée Hébert
Guy Sauvageau
Key Points Inhibition of RAB protein function mediates the anti–acute myeloid leukemia activity of statins. Statin sensitivity is associat… (see more)ed with enhanced vesicle-mediated traffic.
Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval
Robert Belfer
Iulian V. Serban
In this work, we introduce back-training, an alternative to self-training for unsupervised domain adaptation (UDA). While self-training gene… (see more)rates synthetic training data where natural inputs are aligned with noisy outputs, back-training results in natural outputs aligned with noisy inputs. This significantly reduces the gap between target domain and synthetic data distribution, and reduces model overfitting to source domain. We run UDA experiments on question generation and passage retrieval from the Natural Questions domain to machine learning and biomedical domains. We find that back-training vastly outperforms self-training by a mean improvement of 7.8 BLEU-4 points on generation, and 17.6% top-20 retrieval accuracy across both domains. We further propose consistency filters to remove low-quality synthetic data before training. We also release a new domain-adaptation dataset - MLQuestions containing 35K unaligned questions, 50K unaligned passages, and 3K aligned question-passage pairs.
Estimating individual treatment effect on disability progression in multiple sclerosis using deep learning
Jean-Pierre R. Falet
Julien Schroeter
Francesca Bovis
Maria-Pia Sormani
Douglas Lorne Arnold
Disability progression in multiple sclerosis remains resistant to treatment. The absence of a suitable biomarker to allow for phase 2 clinic… (see more)al trials presents a high barrier for drug development. We propose to enable short proof-of-concept trials by increasing statistical power using a deep-learning predictive enrichment strategy. Specifically, a multi-headed multilayer perceptron is used to estimate the conditional average treatment effect (CATE) using baseline clinical and imaging features, and patients predicted to be most responsive are preferentially randomized into a trial. Leveraging data from six randomized clinical trials ( n  = 3,830), we first pre-trained the model on the subset of relapsing-remitting MS patients ( n  = 2,520), then fine-tuned it on a subset of primary progressive MS (PPMS) patients ( n  = 695). In a separate held-out test set of PPMS patients randomized to anti-CD20 antibodies or placebo ( n  = 297), the average treatment effect was larger for the 50% (HR, 0.492; 95% CI, 0.266-0.912; p  = 0.0218) and 30% (HR, 0.361; 95% CI, 0.165-0.79; p  = 0.008) predicted to be most responsive, compared to 0.743 (95% CI, 0.482-1.15; p  = 0.179) for the entire group. The same model could also identify responders to laquinimod in another held-out test set of PPMS patients ( n  = 318). Finally, we show that using this model for predictive enrichment results in important increases in power.