Publications

Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
Kushal Arora
Layla El Asri
Hareesh Bahuleyan
Current language generation models suffer from issues such as repetition, incoherence, and hallucinations. An often-repeated hypothesis for … (see more)this brittleness of generation models is that it is caused by the training and the generation procedure mismatch, also referred to as exposure bias. In this paper, we verify this hypothesis by analyzing exposure bias from an imitation learning perspective. We show that exposure bias leads to an accumulation of errors during generation, analyze why perplexity fails to capture this accumulation of errors, and empirically show that this accumulation results in poor generation quality.
Matching Feature Sets for Few-Shot Image Classification
Arman Afrasiyabi
Jean‐François Lalonde
In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classific… (see more)ation methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets-namely miniImageNet, tieredImageNet, and CUB-in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.
GCNFusion: An efficient graph convolutional network based model for information diffusion
Bahare Fatemi
Soheila Mehr Molaei
Shirui Pan
Mortality trends and length of stays among hospitalized patients with COVID-19 in Ontario and Québec (Canada): a population-based cohort study of the first three epidemic waves
Yiqing Xia
Huiting Ma
M. Brisson
Beate H Sander
A. Chan
Aman Verma
Iris Ganser
Nadine Kronfli
Sharmistha Mishra
Mathieu Maheu-Giroux
Mortality trends and length of stays among hospitalized patients with COVID-19 in Ontario and Québec (Canada): a population-based cohort study of the first three epidemic waves
Yiqing Xia
Huiting Ma
Marc Brisson
Beate Sander
Adrienne Chan
Aman Verma
Iris Ganser
Nadine Kronfli
Sharmistha Mishra
Mathieu Maheu-Giroux
Multivariate, Transgenerational Associations of the COVID-19 Pandemic Across Minoritized and Marginalized Communities.
Sarah W. Yip
Ayana Jordan
Robert J. Kohler
Avram J. Holmes
Importance The experienced consequences of the COVID-19 pandemic have diverged across individuals, families, and communities, resulting in i… (see more)nequity within a host of factors. There is a gap of quantitative evidence about the transgenerational impacts of these experiences and factors. Objective To identify baseline predictors of COVID-19 experiences, as defined by child and parent report, using a multivariate pattern-learning framework from the Adolescent Brain and Cognitive Development (ABCD) cohort. Design, Setting, and Participants ABCD is an ongoing prospective longitudinal study of child and adolescent development in the United States including 11 875 youths, enrolled at age 9 to 10 years. Using nationally collected longitudinal profiling data from 9267 families, a multivariate pattern-learning strategy was developed to identify factor combinations associated with transgenerational costs of the ongoing COVID-19 pandemic. ABCD data (release 3.0) collected from 2016 to 2020 and released between 2019 and 2021 were analyzed in combination with ABCD COVID-19 rapid response data from the first 3 collection points (May-August 2020). Exposures Social distancing and other response measures imposed by COVID-19, including school closures and shutdown of many childhood recreational activities. Main Outcomes and Measures Mid-COVID-19 experiences as defined by the ABCD's parent and child COVID-19 assessments. Results Deep profiles from 9267 youth (5681 female [47.8%]; mean [SD] age, 119.0 [7.5] months) and their caregivers were quantitatively examined. Enabled by a pattern-learning analysis, social determinants of inequity, including family structure, socioeconomic status, and the experience of racism, were found to be primarily associated with transgenerational impacts of COVID-19, above and beyond other candidate predictors such as preexisting medical or psychiatric conditions. Pooling information across more than 17 000 baseline pre-COVID-19 family indicators and more than 280 measures of day-to-day COVID-19 experiences, non-White (ie, families who reported being Asian, Black, Hispanic, other, or a combination of those choices) and/or Spanish-speaking families were found to have decreased resources (mode 1, canonical vector weight [CVW] = 0.19; rank 5 of 281), escalated likelihoods of financial worry (mode 1, CVW = -0.20; rank 4), and food insecurity (mode 1, CVW = 0.21; rank 2), yet were more likely to have parent-child discussions regarding COVID-19-associated health and prevention issues, such as handwashing (mode 1, CVW = 0.14; rank 9), conserving food or other items (mode 1, CVW = 0.21; rank 1), protecting elderly individuals (mode 1, CVW = 0.11; rank 21), and isolating from others (mode 1, CVW = 0.11; rank 23). In contrast, White families (mode 1, CVW = -0.07; rank 3), those with higher pre-COVID-19 income (mode 1, CVW = -0.07; rank 5), and presence of a parent with a postgraduate degree (mode 1, CVW = -0.06; rank 14) experienced reduced COVID-19-associated impact. In turn, children from families experiencing reduced COVID-19 impacts reported longer nighttime sleep durations (mode 1, CVW = 0.13; rank 14), less difficulties with remote learning (mode 2, CVW = 0.14; rank 7), and decreased worry about the impact of COVID-19 on their family's financial stability (mode 1, CVW = 0.134; rank 13). Conclusions and Relevance The findings of this study indicate that community-level, transgenerational intervention strategies may be needed to combat the disproportionate burden of pandemics on minoritized and marginalized racial and ethnic populations.
Population heterogeneity in clinical cohorts affects the predictive accuracy of brain imaging
O. Benkarim
Casey Paquola
Bo-yong Park
Valeria Kebets
Seokjun Hong
Reinder Vos de Wael
Shaoshi Zhang
B.T. Thomas Yeo
Michael Eickenberg
Tian Ge
Jean-Baptiste Poline
B. Bernhardt
Brain imaging research enjoys increasing adoption of supervised machine learning for single-participant disease classification. Yet, the suc… (see more)cess of these algorithms likely depends on population diversity, including demographic differences and other factors that may be outside of primary scientific interest. Here, we capitalize on propensity scores as a composite confound index to quantify diversity due to major sources of population variation. We delineate the impact of population heterogeneity on the predictive accuracy and pattern stability in 2 separate clinical cohorts: the Autism Brain Imaging Data Exchange (ABIDE, n = 297) and the Healthy Brain Network (HBN, n = 551). Across various analysis scenarios, our results uncover the extent to which cross-validated prediction performances are interlocked with diversity. The instability of extracted brain patterns attributable to diversity is located preferentially in regions part of the default mode network. Collectively, our findings highlight the limitations of prevailing deconfounding practices in mitigating the full consequences of population diversity.
Predicting Visual Improvement After Macular Hole Surgery: A Combined Model Using Deep Learning and Clinical Features
Alexandre Lachance
Mathieu Godbout
Fares Antaki
Mélanie Hébert
Serge Bourgault
Mathieu Caissie
Éric Tourville
A. Dirani
Purpose The purpose of this study was to assess the feasibility of deep learning (DL) methods to enhance the prediction of visual acuity (VA… (see more)) improvement after macular hole (MH) surgery from a combined model using DL on high-definition optical coherence tomography (HD-OCT) B-scans and clinical features. Methods We trained a DL convolutional neural network (CNN) using pre-operative HD-OCT B-scans of the macula and combined with a logistic regression model of pre-operative clinical features to predict VA increase ≥15 Early Treatment Diabetic Retinopathy Study (ETDRS) letters at 6 months post-vitrectomy in closed MHs. A total of 121 MHs with 242 HD-OCT B-scans and 484 clinical data points were used to train, validate, and test the model. Prediction of VA increase was evaluated using the area under the receiver operating characteristic curve (AUROC) and F1 scores. We also extracted the weight of each input feature in the hybrid model. Results All performances are reported on the held-out test set, matching results obtained with cross-validation. Using a regression on clinical features, the AUROC was 80.6, with an F1 score of 79.7. For the CNN, relying solely on the HD-OCT B-scans, the AUROC was 72.8 ± 14.6, with an F1 score of 61.5 ± 23.7. For our hybrid regression model using clinical features and CNN prediction, the AUROC was 81.9 ± 5.2, with an F1 score of 80.4 ± 7.7. In the hybrid model, the baseline VA was the most important feature (weight = 59.1 ± 6.9%), while the weight of HD-OCT prediction was 9.6 ± 4.2%. Conclusions Both the clinical data and HD-OCT models can predict postoperative VA improvement in patients undergoing vitrectomy for a MH with good discriminative performances. Combining them into a hybrid model did not significantly improve performance. Translational Relevance OCT-based DL models can predict postoperative VA improvement following vitrectomy for MH but fusing those models with clinical data might not provide improved predictive performance.
Image Retrieval from Contextual Descriptions
Benno Krojer
Vaibhav Adlakha
Vibhav Vineet
Yash Goyal
Edoardo Ponti
The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utte… (see more)rance. In order to measure to what extent current vision-and-language models master this ability, we devise a new multimodal challenge, Image Retrieval from Contextual Descriptions (ImageCoDe). In particular, models are tasked with retrieving the correct image from a set of 10 minimally contrastive candidates based on a contextual description.As such, each description contains only the details that help distinguish between images.Because of this, descriptions tend to be complex in terms of syntax and discourse and require drawing pragmatic inferences. Images are sourced from both static pictures and video frames.We benchmark several state-of-the-art models, including both cross-encoders such as ViLBERT and bi-encoders such as CLIP, on ImageCoDe.Our results reveal that these models dramatically lag behind human performance: the best variant achieves an accuracy of 20.9 on video frames and 59.4 on static pictures, compared with 90.8 in humans.Furthermore, we experiment with new model variants that are better equipped to incorporate visual and temporal context into their representations, which achieve modest gains. Our hope is that ImageCoDE will foster progress in grounded language understanding by encouraging models to focus on fine-grained visual differences.
Fast-Converging Simulated Annealing for Ising Models Based on Integral Stochastic Computing
Naoya Onizawa
K. Katsuki
Duckgyu Shin
Takahiro Hanyu
Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising model… (see more)s. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The stochastic implementation approximates a p-bit function, which can search for a solution to a combinatorial optimization problem at lower energy than conventional p-bits. Searching around the global minimum energy can increase the probability of finding a solution. The proposed stochastic computing-based SA method is compared with conventional SA and quantum annealing (QA) with a D-Wave Two quantum annealer on the traveling salesman, maximum cut (MAX-CUT), and graph isomorphism (GI) problems. The proposed method achieves a convergence speed a few orders of magnitude faster while dealing with an order of magnitude larger number of spins than the other methods.
Fast-Converging Simulated Annealing for Ising Models Based on Integral Stochastic Computing
Naoya Onizawa
Kota Katsuki
Duckgyu Shin
Takahiro Hanyu
Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising model… (see more)s. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The stochastic implementation approximates a p-bit function, which can search for a solution to a combinatorial optimization problem at lower energy than conventional p-bits. Searching around the global minimum energy can increase the probability of finding a solution. The proposed stochastic computing-based SA method is compared with conventional SA and quantum annealing (QA) with a D-Wave Two quantum annealer on the traveling salesman, maximum cut (MAX-CUT), and graph isomorphism (GI) problems. The proposed method achieves a convergence speed a few orders of magnitude faster while dealing with an order of magnitude larger number of spins than the other methods.
From Points to Functions: Infinite-dimensional Representations in Diffusion Models
Sarthak Mittal
Stefan Bauer
Arash Mehrjou
Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative… (see more) Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a learned stochastic differential equation (SDE). Generative models are only concerned with the final state of this trajectory that delivers samples from the desired distribution. Abstreiter et. al showed that these stochastic trajectories can be seen as continuous filters that wash out information along the way. Consequently, it is reasonable to ask if there is an intermediate time step at which the preserved information is optimal for a given downstream task. In this work, we show that a combination of information content from different time steps gives a strictly better representation for the downstream task. We introduce an attention and recurrence based modules that ``learn to mix'' information content of various time-steps such that the resultant representation leads to superior performance in downstream tasks.