Publications

Regeneration Learning: A Learning Paradigm for Data Generation

Xu Tan

Tao Qin

Jiang Bian

Tie-Yan Liu

Yoshua Bengio

2023-01-20

ArXiv (preprint)

doi.org

arxiv.org

Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

Jayakumar Subramanian

Amit Sinha

Aditya Mahajan

2023-01-20

Dynamic Games and Applications (published)

doi.org

arxiv.org

Disentangling poststroke cognitive deficits and their neuroanatomical correlates through combined multivariable and multioutcome lesion‐symptom mapping

Nick A. Weaver

Muhammad Hasnain Mamdani

Jae‐Sung Lim

Johannes Matthijs Biesbroek

Geert Jan Biessels

Irene M. C. Huenges Wajer

Yeonwook Kang

Beom Joon Kim

Byung‐Chul Lee

Keon‐Joo Lee

Kyung‐Ho Yu

Hee‐Joon Bae

Danilo Bzdok

Hugo J. Kuijf

Studies in patients with brain lesions play a fundamental role in unraveling the brain's functional anatomy. Lesion‐symptom mapping (LSM) … (see more)techniques can relate lesion location to cognitive performance. However, a limitation of current LSM approaches is that they can only evaluate one cognitive outcome at a time, without considering interdependencies between different cognitive tests. To overcome this challenge, we implemented canonical correlation analysis (CCA) as combined multivariable and multioutcome LSM approach. We performed a proof‐of‐concept study on 1075 patients with acute ischemic stroke to explore whether addition of CCA to a multivariable single‐outcome LSM approach (support vector regression) could identify infarct locations associated with deficits in three well‐defined verbal memory functions (encoding, consolidation, retrieval) based on four verbal memory subscores derived from the Seoul Verbal Learning Test (immediate recall, delayed recall, recognition, learning ability). We evaluated whether CCA could extract cognitive score patterns that matched prior knowledge of these verbal memory functions, and if these patterns could be linked to more specific infarct locations than through single‐outcome LSM alone. Two of the canonical modes identified with CCA showed distinct cognitive patterns that matched prior knowledge on encoding and consolidation. In addition, CCA revealed that each canonical mode was linked to a distinct infarct pattern, while with multivariable single‐outcome LSM individual verbal memory subscores were associated with largely overlapping patterns. In conclusion, our findings demonstrate that CCA can complement single‐outcome LSM techniques to help disentangle cognitive functions and their neuroanatomical correlates.

2023-01-19

Human Brain Mapping (published)

doi.org

Dynamic Consolidation for Continual Learning

Hang Li

Chen Ma

X. T. Chen

Xue Liu

Abstract Training deep learning models from a stream of nonstationary data is a critical problem to be solved to achieve general artificial … (see more)intelligence. As a promising solution, the continual learning (CL) technique aims to build intelligent systems that have the plasticity to learn from new information without forgetting the previously obtained knowledge. Unfortunately, existing CL methods face two nontrivial limitations. First, when updating a model with new data, existing CL methods usually constrain the model parameters within the vicinity of the parameters optimized for old data, limiting the exploration ability of the model; second, the important strength of each parameter (used to consolidate the previously learned knowledge) is fixed and thus is suboptimal for the dynamic parameter updates. To address these limitations, we first relax the vicinity constraints with a global definition of the important strength, which allows us to explore the full parameter space. Specifically, we define the important strength as the sensitivity of the global loss function to the model parameters. Moreover, we propose adjusting the important strength adaptively to align it with the dynamic parameter updates. Through extensive experiments on popular data sets, we demonstrate that our proposed method outperforms the strong baselines by up to 24% in terms of average accuracy.

2023-01-19

Neural Computation (published)

doi.org

lo-fi: distributed fine-tuning without communication

Mitchell Wortsman

Suchin Gururangan

Shen Li

Ali Farhadi

Ludwig Schmidt

Michael G. Rabbat

Ari S. Morcos

When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contra… (see more)st, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node fine-tunes independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.

2023-01-19

TMLR (accepted)

doi.org

openreview.net

A Framework for Obtaining Accurate Posteriors of Strong Gravitational Lensing Parameters with Flexible Priors and Implicit Likelihoods using Density Estimation

Ronan Legin

Yashar Hezaveh

Laurence Perreault-Levasseur

Benjamin Wandelt

We report the application of implicit likelihood inference to the prediction of the macro-parameters of strong lensing systems with neural n… (see more)etworks. This allows us to perform deep learning analysis of lensing systems within a well-defined Bayesian statistical framework to explicitly impose desired priors on lensing variables, to obtain accurate posteriors, and to guarantee convergence to the optimal posterior in the limit of perfect performance. We train neural networks to perform a regression task to produce point estimates of lensing parameters. We then interpret these estimates as compressed statistics in our inference setup and model their likelihood function using mixture density networks. We compare our results with those of approximate Bayesian neural networks, discuss their significance, and point to future directions. Based on a test set of 100,000 strong lensing simulations, our amortized model produces accurate posteriors for any arbitrary confidence interval, with a maximum percentage deviation of

2023-01-18

The Astrophysical Journal (published)

doi.org

arxiv.org

Label fusion and training methods for reliable representation of inter-rater uncertainty

Andréanne Lemay

Charley Gros

Julien Cohen-Adad

Enamundram Naga Karthik

Medical tasks are prone to inter-rater variability due to multiple factors such as image quality, professional experience and training, or g… (see more)uideline clarity. Training deep learning networks with annotations from multiple raters is a common practice that mitigates the model's bias towards a single expert. Reliable models generating calibrated outputs and reflecting the inter-rater disagreement are key to the integration of artificial intelligence in clinical practice. Various methods exist to take into account different expert labels. We focus on comparing three label fusion methods: STAPLE, average of the rater's segmentation, and random sampling of each rater's segmentation during training. Each label fusion method is studied using both the conventional training framework and the recently published SoftSeg framework that limits information loss by treating the segmentation task as a regression. Our results, across 10 data splittings on two public datasets, indicate that SoftSeg models, regardless of the ground truth fusion method, had better calibration and preservation of the inter-rater rater variability compared with their conventional counterparts without impacting the segmentation performance. Conventional models, i.e., trained with a Dice loss, with binary inputs, and sigmoid/softmax final activate, were overconfident and underestimated the uncertainty associated with inter-rater variability. Conversely, fusing labels by averaging with the SoftSeg framework led to underconfident outputs and overestimation of the rater disagreement. In terms of segmentation performance, the best label fusion method was different for the two datasets studied, indicating this parameter might be task-dependent. However, SoftSeg had segmentation performance systematically superior or equal to the conventionally trained models and had the best calibration and preservation of the inter-rater variability.

2023-01-17

Machine Learning for Biomedical Imaging (published)

doi.org

arxiv.org

Estimating causal effects with optimization-based methods: A review and empirical comparison

Martin Cousineau

Vedat Verter

Susan A. Murphy

Joelle Pineau

2023-01-15

European journal of operational research (published)

doi.org

arxiv.org

Scalable Neural Network Algorithms for High Dimensional Data

Mukesh Soni

Marwan Ali Shnan

Yoshua Bengio

2023-01-14

Mesopotamian Journal of Big Data (published)

doi.org

From IID to the Independent Mechanisms assumption in continual learning

Oleksiy Ostapenko

Pau Rodríguez

Alexandre Lacoste

Laurent Charlin

2023-01-10

AAAI.org/2023/Bridge/CCBridge (accepted)

proceedings.mlr.press

Publisher Correction: Advancing ethics review practices in AI research

Madhulika Srikumar

Rebecca Finlay

Grace M. Abuhamad

Carolyn Ashurst

Rosie Campbell

Emily Campbell-Ratcliffe

Hudson Hongo

Sara Rene Jordan

Joseph Lindley

Aviv Ovadya

Joelle Pineau

2023-01-10

Nature Machine Intelligence (published)

doi.org

Studying Logging Practice in Machine Learning-based Applications

Patrick Loic Foalem

Foutse Khomh

Heng Li

Logging is a common practice in traditional software development. Several research works have been done to investigate the different charact… (see more)eristics of logging practices in traditional software systems (e.g., Android applications, JAVA applications, C/C++ applications). Nowadays, we are witnessing more and more development of Machine Learning-based applications (ML-based applications). Today, there are many popular libraries that facilitate and contribute to the development of such applications, among which we can mention: Pytorch, Tensorflow, Theano, MXNet, Scikit-Learn, Caffe, and Keras. Despite the popularity of ML, we don't have a clear understanding of logging practices in ML applications. In this paper, we aim to fill this knowledge gap and help ML practitioners understand the characteristics of logging in ML-based applications. In particular, we conduct an empirical study on 110 open-source ML-based applications. Through a quantitative analysis, we find that logging practice in ML-based applications is less pervasive than in traditional applications including Android, JAVA, and C/C++ applications. Furthermore, the majority of logging statements in ML-based applications are in info and warn levels, compared to traditional applications where info is the majority of logging statement in C/C++ application and debug, error levels constitute the majority of logging statement in Android application. We also perform a quantitative and qualitative analysis of a random sample of logging statements to understand where ML developers put most of logging statements and examine why and how they are using logging. These analyses led to the following observations: (i) ML developers put most of the logging statements in model training, and in non-ML components. (ii) Data and model management appear to be the main reason behind the introduction of logging statements in ML-based applications.

2023-01-09

ArXiv (preprint)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications