Publications

Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning
Utku Evci
Vincent Dumoulin
Michael Curtis Mozer
3D Infomax improves GNNs for Molecular Property Prediction
Hannes Stärk
Gabriele Corso
Prudencio Tossou
Christian Dallago
Stephan Günnemann
Pietro Lio
Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Including 3D mol… (voir plus)ecular structure as input to learned models improves their predictions for many molecular properties. However, this information is infeasible to compute at the scale required by most real-world applications. We propose pre-training a model to understand the geometry of molecules given only their 2D molecular graph. Using methods from self-supervised learning, we maximize the mutual information between a 3D summary vector and the representations of a Graph Neural Network (GNN) such that they contain latent 3D information. During fine-tuning on molecules with unknown geometry, the GNN still generates implicit 3D information and can use it to inform downstream tasks. We show that 3D pre-training provides significant improvements for a wide range of molecular properties, such as a 22% average MAE reduction on eight quantum mechanical properties. Crucially, the learned representations can be effectively transferred between datasets with vastly different molecules.
Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning
Max B. Paulus
Giulia Zarpellon
Andreas Krause
Chris J. Maddison
Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal so… (voir plus)lution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection - but is too expensive to be deployed in practice. In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. Our model outperforms standard baselines for cut selection on several synthetic MILP benchmarks. Experiments with a B&C solver for neural network verification further validate our approach, and exhibit the potential of learning methods in this setting.
Only tails matter: Average-Case Universality and Robustness in the Convex Regime
Leonardo Cunha
Fabian Pedregosa
Damien Scieur
Towards efficient representation identification in supervised learning
Kartik Ahuja
Divyat Mahajan
Vasilis Syrgkanis
Humans have a remarkable ability to disentangle complex sensory inputs (e.g., image, text) into simple factors of variation (e.g., shape, co… (voir plus)lor) without much supervision. This ability has inspired many works that attempt to solve the following question: how do we invert the data generation process to extract those factors with minimal or no supervision? Several works in the literature on non-linear independent component analysis have established this negative result; without some knowledge of the data generation process or appropriate inductive biases, it is impossible to perform this inversion. In recent years, a lot of progress has been made on disentanglement under structural assumptions, e.g., when we have access to auxiliary information that makes the factors of variation conditionally independent. However, existing work requires a lot of auxiliary information, e.g., in supervised classification, it prescribes that the number of label classes should be at least equal to the total dimension of all factors of variation. In this work, we depart from these assumptions and ask: a) How can we get disentanglement when the auxiliary information does not provide conditional independence over the factors of variation? b) Can we reduce the amount of auxiliary information required for disentanglement? For a class of models where auxiliary information does not ensure conditional independence, we show theoretically and experimentally that disentanglement (to a large extent) is possible even when the auxiliary information dimension is much less than the dimension of the true latent representation.
Towards Scaling Difference Target Propagation by Learning Backprop Targets
Maxence Ernoult
Fabrice Normandin
Abhinav Moudgil
Sean Spinney
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (voir plus) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32
VIM: Variational Independent Modules for Video Prediction
Rim Assouel
Lluis Castrejon
Nicolas Ballas
We introduce a variational inference model called VIM, for Variational Independent Modules, for sequential data that learns and infers laten… (voir plus)t representations as a set of objects and discovers modular causal mechanisms over these objects. These mechanisms - which we call modules - are independently parametrized, define the stochastic transitions of entities and are shared across entities. At each time step, our model infers from a low-level input sequence a high-level sequence of categorical latent variables to select which transition modules to apply to which high-level object. We evaluate this model in video prediction tasks where the goal is to predict multi-modal future events given previous observations. We demonstrate empirically that VIM can model 2D visual sequences in an interpretable way and is able to identify the underlying dynamically instantiated mechanisms of the generation process. We additionally show that the learnt modules can be composed at test time to generalize to out-of-distribution observations.
PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information
Faizy Ahsan
Zichao Yan
Abstract Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics … (voir plus)study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.
GP.2 Deep learning prediction of response to disease modifying therapy in primary progressive multiple sclerosis
JR Falet
Joshua D. Durso-Finley
Brennan Nichyporuk
Julien Schroeter
Francesca Bovis
M Sormani
D Precup
DL Arnold
Background: Only one disease modifying therapy (DMT), ocrelizumab, was found to slow disability progression in primary progressive multiple … (voir plus)sclerosis (PPMS). Modeling the conditional average treatment effect (CATE) using deep learning could identify individuals more responsive to DMTs, allowing for predictive enrichment to increase the power of future clinical trials. Methods: Baseline clinical and MRI data were acquired as part of three placebo-controlled randomized clinical trials: ORATORIO (ocrelizumab), OLYMPUS (rituximab) and ARPEGGIO (laquinimod). Data from ORATORIO and OLYMPUS was separated into a training (70%) and testing (30%) set, while ARPEGGIO served as additional validation. An ensemble of multitask multilayer perceptrons was trained to predict the rate of disability progression on both treatment and placebo to estimate CATE. Results: The model could separate individuals based on their predicted treatment effect. The top 25% of individuals predicted to respond most have a larger effect size (HR 0.442, p=0.0497) than the entire group (HR 0.787, p=0.292). The model could also identify responders to laquinimod. A simulated study where the 50% most responsive individuals are randomized would require 6-times less participants to detect a significant effect. Conclusions: Individuals with PPMS who respond favourably to DMTs can be identified using deep learning based on their baseline clinical and imaging characteristics.
Augmenting Human Selves Through Artificial Agents – Lessons From the Brain
Georg Northoff
Maia Fraser
John Griffiths
Dimitris A. Pinotsis
Rosalyn Moran
Karl Friston
Much of current artificial intelligence (AI) and the drive toward artificial general intelligence (AGI) focuses on developing machines for f… (voir plus)unctional tasks that humans accomplish. These may be narrowly specified tasks as in AI, or more general tasks as in AGI – but typically these tasks do not target higher-level human cognitive abilities, such as consciousness or morality; these are left to the realm of so-called “strong AI” or “artificial consciousness.” In this paper, we focus on how a machine can augment humans rather than do what they do, and we extend this beyond AGI-style tasks to augmenting peculiarly personal human capacities, such as wellbeing and morality. We base this proposal on associating such capacities with the “self,” which we define as the “environment-agent nexus”; namely, a fine-tuned interaction of brain with environment in all its relevant variables. We consider richly adaptive architectures that have the potential to implement this interaction by taking lessons from the brain. In particular, we suggest conjoining the free energy principle (FEP) with the dynamic temporo-spatial (TSD) view of neuro-mental processes. Our proposed integration of FEP and TSD – in the implementation of artificial agents – offers a novel, expressive, and explainable way for artificial agents to adapt to different environmental contexts. The targeted applications are broad: from adaptive intelligence augmenting agents (IA’s) that assist psychiatric self-regulation to environmental disaster prediction and personal assistants. This reflects the central role of the mind and moral decision-making in most of what we do as humans.
Dynamic compression and expansion in a classifying recurrent network
Matthew Farrell
Maxwell J. Farrell
Stefano Recanatesi
Timothy Moore
Eric Todd SheaBrown
Recordings of neural circuits in the brain reveal extraordinary dynamical richness and high variability. At the same time, dimensionality re… (voir plus)duction techniques generally uncover low-dimensional structures underlying these dynamics when tasks are performed. In general, it is still an open question what determines the dimensionality of activity in neural circuits, and what the functional role of this dimensionality in task learning is. In this work we probe these issues using a recurrent artificial neural network (RNN) model trained by stochastic gradient descent to discriminate inputs. The RNN family of models has recently shown promise in revealing principles behind brain function. Through simulations and mathematical analysis, we show how the dimensionality of RNN activity depends on the task parameters and evolves over time and over stages of learning. We find that common solutions produced by the network naturally compress dimensionality, while variability-inducing chaos can expand it. We show how chaotic networks balance these two factors to solve the discrimination task with high accuracy and good generalization properties. These findings shed light on mechanisms by which artificial neural networks solve tasks while forming compact representations that may generalize well.
Relationship Between Arterial Stiffness Index, Pulse Pressure, and Magnetic Resonance Imaging Markers of White Matter Integrity: A UK Biobank Study
Atef Badji
Hélène Girouard