Publications

Towards Scaling Difference Target Propagation by Learning Backprop Targets
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (voir plus) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32
Utility Theory for Sequential Decision Making
The von Neumann-Morgenstern (VNM) utility theorem shows that under certain axioms of rationality, decision-making is reduced to maximizing t… (voir plus)he expectation of some utility function. We extend these axioms to increasingly structured sequential decision making settings and identify the structure of the corresponding utility functions. In particular, we show that memoryless preferences lead to a utility in the form of a per transition reward and multiplicative factor on the future return. This result motivates a generalization of Markov Decision Processes (MDPs) with this structure on the agent's returns, which we call Affine-Reward MDPs. A stronger constraint on preferences is needed to recover the commonly used cumulative sum of scalar rewards in MDPs. A yet stronger constraint simplifies the utility function for goal-seeking agents in the form of a difference in some function of states that we call potential functions. Our necessary and sufficient conditions demystify the reward hypothesis that underlies the design of rational agents in reinforcement learning by adding an axiom to the VNM rationality axioms and motivates new directions for AI research involving sequential decision making.
VIM: Variational Independent Modules for Video Prediction
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
Ofir Nachum
Shixiang Shane Gu
In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is… (voir plus) uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.
YOUR AUTOREGRESSIVE GENERATIVE MODEL CAN BE BETTER IF YOU TREAT IT AS AN ENERGY-BASED ONE
Yezhen Wang
Tong Che
Bin Li
Kaitao Song
Hengzhi Pei
Dongsheng Li
GP.2 Deep learning prediction of response to disease modifying therapy in primary progressive multiple sclerosis
JR Falet
Joshua D. Durso-Finley
Julien Schroeter
Francesca Bovis
M Sormani
D Precup
DL Arnold
Background: Only one disease modifying therapy (DMT), ocrelizumab, was found to slow disability progression in primary progressive multiple … (voir plus)sclerosis (PPMS). Modeling the conditional average treatment effect (CATE) using deep learning could identify individuals more responsive to DMTs, allowing for predictive enrichment to increase the power of future clinical trials. Methods: Baseline clinical and MRI data were acquired as part of three placebo-controlled randomized clinical trials: ORATORIO (ocrelizumab), OLYMPUS (rituximab) and ARPEGGIO (laquinimod). Data from ORATORIO and OLYMPUS was separated into a training (70%) and testing (30%) set, while ARPEGGIO served as additional validation. An ensemble of multitask multilayer perceptrons was trained to predict the rate of disability progression on both treatment and placebo to estimate CATE. Results: The model could separate individuals based on their predicted treatment effect. The top 25% of individuals predicted to respond most have a larger effect size (HR 0.442, p=0.0497) than the entire group (HR 0.787, p=0.292). The model could also identify responders to laquinimod. A simulated study where the 50% most responsive individuals are randomized would require 6-times less participants to detect a significant effect. Conclusions: Individuals with PPMS who respond favourably to DMTs can be identified using deep learning based on their baseline clinical and imaging characteristics.
Neural Networks as Paths through the Space of Representations
Richard D Lange
Jordan Kyle Matelsky
Xinyue Wang
Konrad Paul Kording
Relationship Between Arterial Stiffness Index, Pulse Pressure, and Magnetic Resonance Imaging Markers of White Matter Integrity: A UK Biobank Study
Atef Badji
Hélène Girouard
Alzheimer’s disease and dementia in general constitute one of the major public health problems of the 21st century. Research in arterial s… (voir plus)tiffness and pulse pressure (PP) play an important role in the quest to reduce the risk of developing dementia through controlling modifiable risk factors. The aim of the study is to investigate the association between peripheral PP, arterial stiffness index (ASI) and brain integrity, and to discover if ASI is a better predictor of white matter integrity than peripheral PP. 17,984 participants 63.09 ± 7.31 from the UK Biobank were used for this study. ASI was estimated using infrared light (photoplethysmography) and peripheral PP was calculated by subtracting the diastolic from the systolic brachial blood pressure value. Measure of fractional anisotropy (FA) was obtained from diffusion imaging to estimate white matter microstructural integrity. White matter hyperintensities were segmented from the combined T1 and T2-weighted FLAIR images as a measure of irreversible white matter damage. An important finding is that peripheral PP better predicts white matter integrity when compared to ASI. This finding is consistent until 75 years old. Interestingly, no significant relationship is found between either peripheral PP or ASI and white matter integrity after 75 years old. These results suggest that ASI from plethysmography should not be used to estimate cerebrovascular integrity in older adults and further question the relationship between arterial stiffness, blood pressure, and white matter damage after the age of 75 years old.
Healthsheet: Development of a Transparency Artifact for Health Datasets
Diana Mincu
Subhrajit Roy
Andrew J Smart
Lauren Wilcox
Mahima Pushkarna
Jessica Schrouff
Razvan Amironesei
Nyalleng Moorosi
Katherine Heller
Uniform Priors for Data-Efficient Learning
Samarth Sinha
Marzyeh Ghassemi
Zeynep Akata
Animesh Garg
Few or zero-shot adaptation to novel tasks is important for the scalability and deployment of machine learning models. It is therefore cruci… (voir plus)al to find properties that encourage more transferable features in deep networks for generalization. In this paper, we show that models that learn uniformly distributed features from the training data, are able to perform better transfer learning at test-time. Motivated by this, we evaluate our method: uniformity regularization (UR) on its ability to facilitate adaptation to unseen tasks and data on six distinct domains: Few-Learning with Images, Few-shot Learning with Language, Deep Metric Learning, 0-Shot Domain Adaptation, Out-of-Distribution classification, and Neural Radiance Fields. Across all experiments, we show that using UR, we are able to learn robust vision systems which consistently offer benefits over baselines trained without uniformity regularization and are able to achieve state-of-the-art performance in Deep Metric Learning, Few-shot learning with images and language.
From YouTube to the brain: Transfer learning can improve brain-imaging predictions with deep learning
Nahiyan Malik
Kubric: A scalable dataset generator
Klaus Greff
Francois Belletti
Lucas Beyer
Carl Doersch
Yilun Du
Daniel Duckworth
David J Fleet
Dan Gnanapragasam
Charles Herrmann
Thomas Kipf
Abhijit Kundu
Dmitry Lagun
Issam Hadj Laradji
Hsueh-Ti Liu
Henning Meyer
Yishu Miao
D. Nowrouzezahrai
Cengiz Oztireli
Etienne Pot … (voir 14 de plus)
Noha Radwan
Daniel Rebain
Sara Sabour
Mehdi S. M. Sajjadi
Matan Sela
Vincent Sitzmann
Austin Stone
Deqing Sun
Suhani Vora
Ziyu Wang
Tianhao Wu
Kwang Moo Yi
Fangcheng Zhong
Andrea Tagliasacchi
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance o… (voir plus)f a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification.