Publications

Pandemic policy assessment by artificial intelligence

Sirui Song

Xue Liu

Ying Li

Yang Yu

2022-08-15

Scientific Reports (published)

doi.org

High Fidelity Visualization of What Your Self-Supervised Representation Knows About

Florian Bordes

Randall Balestriero

P Vincent

Discovering what is learned by neural networks remains a challenge. In self-supervised learning, classification is the most common task used… (see more) to evaluate how good a representation is. However, relying only on such downstream task can limit our understanding of what information is retained in the representation of a given input. In this work, we showcase the use of a Representation Conditional Diffusion Model (RCDM) to visualize in data space the representations learned by self-supervised models. The use of RCDM is motivated by its ability to generate high-quality samples -- on par with state-of-the-art generative models -- while ensuring that the representations of those samples are faithful i.e. close to the one used for conditioning. By using RCDM to analyze self-supervised models, we are able to clearly show visually that i) SSL (backbone) representation are not invariant to the data augmentations they were trained with -- thus debunking an often restated but mistaken belief; ii) SSL post-projector embeddings appear indeed invariant to these data augmentation, along with many other data symmetries; iii) SSL representations appear more robust to small adversarial perturbation of their inputs than representations trained in a supervised manner; and iv) that SSL-trained representations exhibit an inherent structure that can be explored thanks to RCDM visualization and enables image manipulation.

2022-08-14

TMLR (accepted)

openreview.net

Automatic Phenotyping by a Seed-guided Topic Model.

Ziyang Song

Yuanyi Hu

Aman Verma

David L. Buckeridge

Yue Li

Electronic health records (EHRs) provide rich clinical information and the opportunities to extract epidemiological patterns to understand a… (see more)nd predict patient disease risks with suitable machine learning methods such as topic models. However, existing topic models do not generate identifiable topics each predicting a unique phenotype. One promising direction is to use known phenotype concepts to guide topic inference. We present a seed-guided Bayesian topic model called MixEHR-Seed with 3 contributions: (1) for each phenotype, we infer a dual-form of topic distribution: a seed-topic distribution over a small set of key EHR codes and a regular topic distribution over the entire EHR vocabulary; (2) we model age-dependent disease progression as Markovian dynamic topic priors; (3) we infer seed-guided multi-modal topics over distinct EHR data types. For inference, we developed a variational inference algorithm. Using MixEHR-Seed, we inferred 1569 PheCode-guided phenotype topics from an EHR database in Quebec, Canada covering 1.3 million patients for up to 20-year follow-up with 122 million records for 8539 and 1126 unique diagnostic and drug codes, respectively. We observed (1) accurate phenotype prediction by the guided topics, (2) clinically relevant PheCode-guided disease topics, (3) meaningful age-dependent disease prevalence. Source code is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Seed.

2022-08-13

Knowledge Discovery and Data Mining (published)

doi.org

TITRATED: Learned Human Driving Behavior without Infractions via Amortized Inference

Vasileios Lioutas

Adam Ścibior

Frank N. Wood

2022-08-11

TMLR (accepted)

openreview.net

Heatmap Regression for Lesion Detection using Pointwise Annotations

Chelsea Myers-colet

Julien Schroeter

Douglas Arnold

Tal Arbel

In many clinical contexts, detecting all lesions is imperative for evaluating disease activity. Standard approaches pose lesion detection as… (see more) a segmentation problem despite the time-consuming nature of acquiring segmentation labels. In this paper, we present a lesion detection method which relies only on point labels. Our model, which is trained via heatmap regression, can detect a variable number of lesions in a probabilistic manner. In fact, our proposed post-processing method offers a reliable way of directly estimating the lesion existence uncertainty. Experimental results on Gad lesion detection show our point-based method performs competitively compared to training on expensive segmentation labels. Finally, our detection model provides a suitable pre-training for segmentation. When fine-tuning on only 17 segmentation samples, we achieve comparable performance to training with the full dataset.

2022-08-10

ArXiv (preprint)

doi.org

arxiv.org

RandomSCM: interpretable ensembles of sparse classifiers tailored for omics data

Thibaud Godon

Pier-Luc Plante

Baptiste Bauvin

Élina Francovic-Fontaine

Alexandre Drouin

J. Corbeil

Background: Understanding the relationship between the Omics and the phenotype is a central problem in precision medicine. The high dimensio… (see more)nality of metabolomics data challenges learning algorithms in terms of scalability and generalization. Most learning algorithms do not produce interpretable models -- Method: We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules. -- Results : Applications on metabolomics data shows that it produces models that achieves high predictive performances. The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.

2022-08-10

ArXiv (preprint)

doi.org

arxiv.org

Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning

Siba Moussa

Michael Kilgour

Clara Jans

Alex Hernández-García

Miroslava Cuperlovic‐culf

Yoshua Bengio

Lena Simine

2022-08-09

ArXiv (preprint)

doi.org

arxiv.org

Learning to Improve Code Efficiency

Binghong Chen

Daniel Tarlow

Kevin Swersky

Martin Maas

Pablo Heiber

Ashish V Naik

Milad Hashemi

Parthasarathy Ranganathan

2022-08-08

ArXiv (preprint)

doi.org

openreview.net

Endorsing Complexity Through Diversity: Computational Psychiatry Meets Big Data Analytics

Jakub Kopal

Danilo Bzdok

2022-08-04

Biological Psychiatry (unknown)

doi.org

Estimating the lagged effect of price discounting: a time-series study using transaction data of sugar sweetened beverages.

Hiroshi Mamiya

Alexandra M. Schmidt

Erica E. M. Moodie

David L. Buckeridge

Price discount is an unregulated obesogenic environmental risk factor for the purchasing of unhealthy food, including Sugar Sweetened Bevera… (see more)ges (SSB). Sales of price discounted food items are known to increase during the period of discounting. However, the presence and extent of the lagged effect of discounting, a sustained level of sales after discounting ends, is previously unaccounted for. We investigated the presence of the lagged effect of discounting on the sales of five SSB categories, which are soda, fruits juice, sport and energy drink, sweetened coffee and tea, and sweetened drinkable yogurt. We fitted a distributed lag model to weekly volume-standardized sales and percent discounting generated by a supermarket in Montreal, Canada between 2008 and 2013. While the sales of SSB increased during the period of discounting, there was no evidence of a prominent lagged effect of discounting in four of the five SSB; the exception was sports and energy drinks, where a posterior mean of 28,459 servings (95% credible interval: 2,661 to 67,253) of excess sales can be attributed to the lagged effect in the target store during the study period. Our results indicate that previous studies may have underestimated the effect of price discounting for some food categories. Temporary price discounting is an important component of obesogenic food environment, as it has been shown to increase the sales of discretionary food items during the period of discounting. Even after a period of price discounting has ended, the sales of sports and energy drinks remain at a higher level relative to the sales before discounting. Previous research focusing on the immediate effect (i.e., same time period) of price discounting may have systematically underestimated the impact of price discounting for some food categories. The findings and analytical method in this study promote improved validity of future food environment research targeting the impact of discounting and other types of food promotions on the sales of energy-dense and nutrition-poor food items.

2022-08-04

BMC Public Health (published)

doi.org

Counterfactual Image Synthesis for Discovery of Personalized Predictive Image Markers

Amar Kumar

Anjun Hu

Brennan Nichyporuk

Jean-Pierre R. Falet

Douglas Arnold

Sotirios A. Tsaftaris

Tal Arbel

2022-08-02

ArXiv (preprint)

doi.org

arxiv.org

Galaxies and Halos on Graph Neural Networks: Deep Generative Modeling Scalar and Vector Quantities for Intrinsic Alignment

Yesukhei Jagvaral

François Lanusse

Sukhdeep Singh

Rachel Mandelbaum

Siamak Ravanbakhsh

Duncan Campbell

In order to prepare for the upcoming wide-field cosmological surveys, large simulations of the Universe with realistic galaxy populations ar… (see more)e required. In particular, the tendency of galaxies to naturally align towards overdensities, an effect called intrinsic alignments (IA), can be a major source of systematics in the weak lensing analysis. As the details of galaxy formation and evolution relevant to IA cannot be simulated in practice on such volumes, we propose as an alternative a Deep Generative Model. This model is trained on the IllustrisTNG-100 simulation and is capable of sampling the orientations of a population of galaxies so as to recover the correct alignments. In our approach, we model the cosmic web as a set of graphs, where the graphs are constructed for each halo, and galaxy orientations as a signal on those graphs. The generative model is implemented on a Generative Adversarial Network architecture and uses specifically designed Graph-Convolutional Networks sensitive to the relative 3D positions of the vertices. Given (sub)halo masses and tidal fields, the model is able to learn and predict scalar features such as galaxy and dark matter subhalo shapes; and more importantly, vector features such as the 3D orientation of the major axis of the ellipsoid and the complex 2D ellipticities. For correlations of 3D orientations the model is in good quantitative agreement with the measured values from the simulation, except for at very small and transition scales. For correlations of 2D ellipticities, the model is in good quantitative agreement with the measured values from the simulation on all scales. Additionally, the model is able to capture the dependence of IA on mass, morphological type and central/satellite type.

2022-08-01

Monthly Notices of the Royal Astronomical Society (unknown)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications