Publications

Curating the Twitter Election Integrity Datasets for Better Online Troll Characterization
Albert Manuel Orozco Camacho
In modern days, social media platforms provide accessible channels for the inter-action and immediate reflection of the most important event… (voir plus)s happening around the world. In this paper, we, firstly, present a curated set of datasets whose origin stem from the Twitter’s Information Operations efforts. More notably, these accounts, which have been already suspended, provide a notion of how state-backed human trolls operate.Secondly, we present detailed analyses of how these behaviours vary over time,and motivate its use and abstraction in the context of deep representation learning:for instance, to learn and, potentially track, troll behaviour. We present baselinesf or such tasks and highlight the differences there may exist within the literature.Finally, we utilize the representations learned for behaviour prediction to classify trolls from"real"users, using a sample of non-suspended active accounts.
Genomic epidemiology and associated clinical outcomes of a SARS-CoV-2 outbreak in a general adult hospital in Quebec
Bastien Paré
Marieke Rozendaal
Sacha Morin
Raphael Poujol
Fatima Mostefai
Shawn M. Simpson
Jean-Christophe Grenier
Léa Kaufmann
Henry Xing
Miguelle Sanchez
Ariane Yechouron
Ronald Racette
Ivan Pavlov
Martin Smith
Patient health records and whole viral genomes from an early SARS-CoV-2 outbreak in a Quebec hospital reveal features associated with favorable outcomes
Bastien Paré
Marieke Rozendaal
Sacha Morin
Léa Kaufmann
Shawn M. Simpson
Raphael Poujol
Fatima Mostefai
Jean-Christophe Grenier
Henry Xing
Miguelle Sanchez
Ariane Yechouron
Ronald Racette
Ivan Pavlov
Martin Smith
Online Partisan Polarization of COVID-19
Zachary Yang
Anne Imouza
Kellin Pelrine
Sacha Lévy
Jiewen Liu
Gabrielle Desrosiers-Brisebois
Jean-François Godbout
André Blais
In today’s age of (mis)information, many people utilize various social media platforms in an attempt to shape public opinion on several im… (voir plus)portant issues, including elections and the COVID-19 pandemic. These two topics have recently become intertwined given the importance of complying with public health measures related to COVID-19 and politicians’ management of the pandemic. Motivated by this, we study the partisan polarization of COVID-19 discussions on social media. We propose and utilize a novel measure of partisan polarization to analyze more than 380 million posts from Twitter and Parler around the 2020 US presidential election. We find strong correlation between peaks in polarization and polarizing events, such as the January 6th Capitol Hill riot. We further classify each post into key COVID-19 issues of lockdown, masks, vaccines, as well as miscellaneous, to investigate both the volume and polarization on these topics and how they vary through time. Parler includes more negative discussions around lockdown and masks, as expected, but not much around vaccines. We also observe more balanced discussions on Twitter and a general disconnect between the discussions on Parler and Twitter.
Exploring the roles of artificial intelligence in surgical education: A scoping review.
Elif Bilgic
Andrew Gorgy
Alison Yang
Michelle Cwintal
Hamed Ranjbar
Kalin Kahla
Dheeksha Reddy
Kexin Li
Helin Ozturk
Eric Zimmermann
Andrea Quaiattini
Jason M. Harley
Hypo- and hyper- sensory processing heterogeneity in Autism Spectrum Disorder
Aline Lefebvre
Julian Tillmann
Freddy Cliquet
Frederique Amsellem
Anna Maruani
Claire Leblond
Anita Beggiato
David Germanaud
Anouck Amestoy
Myriam Ly‐Le Moal
Daniel Umbricht
Christopher Chattam
Lorraine Murtagh
Manuel Bouvard
Marion Leboyer
Tony Charman
Thomas Bourgeron
Richard Delorme
Background. Sensory processing atypicalities are part of the core symptoms of autism spectrum disorder (ASD) and could result from an excita… (voir plus)tion/inhibition imbalance. Yet, the convergence level of phenotypic sensory processing atypicalities with genetic alterations in GABA-ergic and glutamatergic pathways remains poorly understood. This study aimed to characterize the distribution of hypo/hyper-sensory profile among individuals with ASD and investigate the role of deleterious mutations in GABAergic and glutamatergic pathways related genes in sensory processing atypicalities. Method. From the Short Sensory Profile (SSP) questionnaire, we defined and explored a score – the differential Short Sensory Profile (dSSP) - as a normalized and centralized hypo/hypersensitivity ratio for 1136 participants (533 with ASD, 210 first-degree relatives, and 267 controls) from two independent study samples (PARIS and LEAP). We also performed an unsupervised item-based clustering analysis on SSP items scores to validate this new categorization in terms of hypo and hyper sensitivity. We then explored the link between the dSSP score and the burden of deleterious mutations in a subset of individuals for which whole-genome sequencing data were available. Results. We observed a mean dSSP score difference between ASD and controls, driven mostly by a high dSSP score variability among groups (PARIS: p0.0001, η2 = 0.0001, LEAP: p0.0001, Cohen’s d=3.67). First-degree relatives were with an intermediate distribution variability prof
Fixing Bias in Reconstruction-based Anomaly Detection with Lipschitz Discriminators
Alexander Tong
Smita Krishnaswamy
Anomaly detection is of great interest in fields where abnormalities need to be identified and corrected (e.g., medicine and finance). Deep … (voir plus)learning methods for this task often rely on autoencoder reconstruction error, sometimes in conjunction with other penalties. We show that this approach exhibits intrinsic biases that lead to undesirable results. Reconstruction-based methods can sometimes show low error on simple-to-reconstruct points that are not part of the training data, for example the all black image. Instead, we introduce a new unsupervised Lipschitz anomaly discriminator (LAD) that does not suffer from these biases. Our anomaly discriminator is trained, similar to the discriminator of a GAN, to detect the difference between the training data and corruptions of the training data. We show that this procedure successfully detects unseen anomalies with guarantees on those that have a certain Wasserstein distance from the data or corrupted training set. These additions allow us to show improved performance on MNIST, CIFAR10, and health record data. Further, LAD does not require decoding back to the original data space, which makes anomaly detection possible in domains where it is difficult to define a decoder, such as in irregular graph structured data. Empirically, we show this framework leads to improved performance on image, health record, and graph data.
Processing visual ambiguity in fractal patterns: Pareidolia as a sign of creativity
Antoine Bellemare
Yann Harel
Jordan O’Byrne
Genevieve A. Mageau
Arne Dietrich
Medial Spectral Coordinates for 3D Shape Analysis
Morteza Rezanejad
Mohammad Khodadad
H. Mahyar
M. Gruninger
Dirk. B. Walther
In recent years there has been a resurgence of interest in our community in the shape analysis of 3D objects repre-sented by surface meshes,… (voir plus) their voxelized interiors, or surface point clouds. In part, this interest has been stimulated by the increased availability of RGBD cameras, and by applications of computer vision to autonomous driving, medical imaging, and robotics. In these settings, spectral co-ordinates have shown promise for shape representation due to their ability to incorporate both local and global shape properties in a manner that is qualitatively invariant to iso-metric transformations. Yet, surprisingly, such coordinates have thus far typically considered only local surface positional or derivative information. In the present article, we propose to equip spectral coordinates with medial (object width) information, so as to enrich them. The key idea is to couple surface points that share a medial ball, via the weights of the adjacency matrix. We develop a spectral feature using this idea, and the algorithms to compute it. The incorporation of object width and medial coupling has direct benefits, as illustrated by our experiments on object classification, object part segmentation, and surface point correspondence.
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Sai Rajeswar
Pau Rodriguez
Soumye Singhal
David Vazquez
Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that da… (voir plus)tasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe
The meaning of significant mean group differences for biomarker discovery
Eva Loth
Jumana Ahmad
Christopher H. Chatham
Beatriz López
Ben Carter
Daisy Crawley
Beth Oakley
Hannah Hayward
Jennifer Cooke
Antonia San José Cáceres
Emily J. H. Jones
Tony Charman
Christian Beckmann
Thomas Bourgeron
Roberto Toro
Jan K. Buitelaar
Declan Murphy
Splitting, Renaming, Removing: A Study of Common Cleaning Activities in Jupyter Notebooks
Helen Dong
Shurui Zhou
Christian Kästner
Data scientists commonly use computational notebooks because they provide a good environment for testing multiple models. However, once the … (voir plus)scientist completes the code and finds the ideal model, he or she will have to dedicate time to clean up the code in order for others to easily understand it. In this paper, we perform a qualitative study on how scientists clean their code in hopes of being able to suggest a tool to automate this process. Our end goal is for tool builders to address possible gaps and provide additional aid to data scientists, who then can focus more on their actual work rather than the routine and tedious cleaning work. By sampling notebooks from GitHub and analyzing changes between subsequent commits, we identified common cleaning activities, such as changes to markdown (e.g., adding headers sections or descriptions) or comments (both deleting dead code and adding descriptions) as well as reordering cells. We also find that common cleaning activities differ depending on the intended purpose of the notebook. Our results provide a valuable foundation for tool builders and notebook users, as many identified cleaning activities could benefit from codification of best practices and dedicated tool support, possibly tailored depending on intended use.