Francis Dutil

CMIM: Cross-Modal Information Maximization For Medical Imaging

Tess Berthier

Lisa Di Jorio

Margaux Luck

R Devon Hjelm

In hospitals, data are siloed to specific information systems that make the same information available under different modalities such as th… (see more)e different medical imaging exams the patient undergoes (CT scans, MRI, PET, Ultrasound, etc.) and their associated radiology reports. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.In this paper, we propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time, using recent advances in mutual information maximization. By maximizing cross-modal information at train time, we are able to outperform several state-of-the-art baselines in two different settings, medical image classification, and segmentation. In particular, our method is shown to have a strong impact on the inference-time performance of weaker modalities.

2021-06-05

IEEE International Conference on Acoustics, Speech, and Signal Processing (published)

doi.org

Saliency Is a Possible Red Herring When Diagnosing Poor Generalization

Joseph D. Viviano

Becks Simpson

Francis Dutil

Yoshua Bengio

Joseph Paul Cohen

Poor generalization is one symptom of models that learn to predict target variables using spuriously-correlated image features present only … (see more)in the training distribution instead of the true image features that denote a class. It is often thought that this can be diagnosed visually using attribution (aka saliency) maps. We study if this assumption is correct. In some prediction tasks, such as for medical images, one may have some images with masks drawn by a human expert, indicating a region of the image containing relevant information to make the prediction. We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert has labelled as important and generalization performance. These results suggest that the root cause of poor generalization may not always be spatially defined, and raise questions about the utility of masks as "attribution priors" as well as saliency maps for explainable predictions.

2021-05-02

International Conference on Learning Representations (Poster)

doi.org

openreview.net

Cross-Modal Information Maximization for Medical Imaging: CMIM

Tristan Sylvain

Francis Dutil

Tess Berthier

Lisa Di Jorio

Margaux Luck

R Devon Hjelm

Yoshua Bengio

2020-10-19

ArXiv (preprint)

arxiv.org

InfoMask: Masked Variational Latent Representation to Localize Chest Disease

Saeid Asgari Taghanaki

Mohammad Havaei

Tess Berthier

Francis Dutil

Lisa Di Jorio

Ghassan Hamarneh

Yoshua Bengio

2019-10-09

Lecture Notes in Computer Science (published)

doi.org

arxiv.org

GradMask: Reduce Overfitting by Regularizing Saliency

Becks Simpson

Francis Dutil

Yoshua Bengio

Joseph Paul Cohen

With too few samples or too many model parameters, overfitting can inhibit the ability to generalise predictions to new data. Within medical… (see more) imaging, this can occur when features are incorrectly assigned importance such as distinct hospital specific artifacts, leading to poor performance on a new dataset from a different institution without those features, which is undesirable. Most regularization methods do not explicitly penalize the incorrect association of these features to the target class and hence fail to address this issue. We propose a regularization method, GradMask, which penalizes saliency maps inferred from the classifier gradients when they are not consistent with the lesion segmentation. This prevents non-tumor related features to contribute to the classification of unhealthy samples. We demonstrate that this method can improve test accuracy between 1-3% compared to the baseline without GradMask, showing that it has an impact on reducing overfitting.

2019-04-14

Medical Imaging with Deep Learning (accepted)

doi.org

openreview.net

Saliency Is a Possible Red Herring When Diagnosing Poor Generalization

Joseph D. Viviano

Becks Simpson

Francis Dutil

Yoshua Bengio

Joseph Paul Cohen

Poor generalization is one symptom of models that learn to predict target variables using spuriously-correlated image features present only … (see more)in the training distribution instead of the true image features that denote a class. It is often thought that this can be diagnosed visually using attribution (aka saliency) maps. We study if this assumption is correct. In some prediction tasks, such as for medical images, one may have some images with masks drawn by a human expert, indicating a region of the image containing relevant information to make the prediction. We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert has labelled as important and generalization performance. These results suggest that the root cause of poor generalization may not always be spatially defined, and raise questions about the utility of masks as "attribution priors" as well as saliency maps for explainable predictions.

2018-12-31

arXiv (preprint)

doi.org

openreview.net

Towards the Latent Transcriptome

Assya Trofimov

Francis Dutil

Claude Perreault

S. Lemieux

Yoshua Bengio

Joseph Paul Cohen

In this work we propose a method to compute continuous embeddings for kmers from raw RNA-seq data, in a reference-free fashion. We report th… (see more)at our model captures information of both DNA sequence similarity as well as DNA sequence abundance in the embedding latent space. We confirm the quality of these vectors by comparing them to known gene sub-structures and report that the latent space recovers exon information from raw RNA-Seq data from acute myeloid leukemia patients. Furthermore we show that this latent space allows the detection of genomic abnormalities such as translocations as well as patient-specific mutations, making this representation space both useful for visualization as well as analysis.

2018-09-26

ArXiv (preprint)

openreview.net

Towards Gene Expression Convolutions using Gene Interaction Graphs

We study the challenges of applying deep learning to gene expression data. We find experimentally that there exists non-linear signal in the… (see more) data, however is it not discovered automatically given the noise and low numbers of samples used in most research. We discuss how gene interaction graphs (same pathway, protein-protein, co-expression, or research paper text association) can be used to impose a bias on a deep model similar to the spatial bias imposed by convolutions on an image. We explore the usage of Graph Convolutional Neural Networks coupled with dropout and gene embeddings to utilize the graph information. We find this approach provides an advantage for particular tasks in a low data regime but is very dependent on the quality of the graph used. We conclude that more work should be done in this direction. We design experiments that show why existing methods fail to capture signal that is present in the data when features are added which clearly isolates the problem that needs to be addressed.

2018-06-17

ArXiv (preprint)

arxiv.org

Graph Priors for Deep Neural Networks

In this work we explore how gene-gene interaction graphs can be used as a prior for the representation of a model to construct features base… (see more)d on known interactions between genes. Most existing machine learning work on graphs focuses on building models when data is confined to a graph structure. In this work we focus on using the information from a graph to build better representations in our models. We use the percolate task, determining if a path exists across a grid for a set of node values, as a proxy for gene pathways. We create variants of the percolate task to explore where existing methods fail. We test the limits of existing methods in order to determine what can be improved when applying these methods to a real task. This leads us to propose new methods based on Graph Convolutional Networks (GCN) that use pooling and dropout to deal with noise in the graph prior.

2018-02-11

(published)

openreview.net

Adversarial Generation of Natural Language

Sandeep Subramanian

Sai Rajeswar

Francis Dutil

Christopher Pal

Aaron Courville

Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for … (see more)image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.

2017-07-31

Proceedings of the 2nd Workshop on Representation Learning for NLP (published)

doi.org

arxiv.org

Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning

Caglar Gulçehre

Francis Dutil

Adam Trischler

Yoshua Bengio

We investigate the integration of a planning mechanism into an encoder-decoder architecture with attention. We develop a model that can plan… (see more) ahead when it computes alignments between the source and target sequences not only for a single time-step but for the next k time-steps as well by constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by strategic attentive reader and writer (STRAW) model, a recent neural architecture for planning with hierarchical reinforcement learning that can also learn higher level temporal abstractions. Our proposed model is end-to-end trainable with differentiable operations. We show that our model outperforms strong baselines on character-level translation task from WMT’15 with fewer parameters and computes alignments that are qualitatively intuitive.

2017-07-31

Proceedings of the 2nd Workshop on Representation Learning for NLP (published)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Francis Dutil

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Francis Dutil

Publications