Publications

Parametric Scattering Networks
Shanel Gauthier
Benjamin Thérien
Laurent Alséne-Racicot
Muawiz Chaudhary
Michael Eickenberg
The wavelet scattering transform creates geometric in-variants and deformation stability. In multiple signal do-mains, it has been shown to … (see more)yield more discriminative rep-resentations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering trans-form are typically selected to create a tight frame via a pa-rameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is op-timal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scat-tering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective rep-resentations.
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
MohammadReza Davari
Nader Asadi
Sudhir Mudur
Rahaf Aljundi
Continual Learning (CL) research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic fo… (see more)rgetting is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representation, is typically measured or observed by evaluating the decrease in old task performance. However, a model's representation can change without losing knowledge about prior tasks. In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that, through this lens, model representations trained without any explicit control for forgetting often experience small representation forgetting and can sometimes be comparable to methods which explicitly control for forgetting, especially in longer task sequences. We also show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning. Based on our results, we show that a simple yet competitive approach is to learn representations continually with standard supervised contrastive learning while constructing prototypes of class samples when queried on old samples.11The code to reproduce our results is publicly available at: https://github.com/rezazzr/Probing-Representation-Forgetting
Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning
Moslem Yazdanpanah
Aamer Abdul Rahman
Muawiz Chaudhary
Christian Desrosiers
Mohammad Havaei
Batch normalization is a staple of computer vision models, including those employed in few-shot learning. Batch nor-malization layers in con… (see more)volutional neural networks are composed of a normalization step, followed by a shift and scale of these normalized features applied via the per-channel trainable affine parameters
Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
Xin Wang
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Yale Song
Contrastive learning relies on an assumption that positive pairs contain related views that share certain underlying information about an in… (see more)stance, e.g., patches of an image or co-occurring multimodal signals of a video. What if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we pro-pose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to ex-isting contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning bench-marks that exhibit a variety of real-world noise patterns.
Heterogeneous Supervised Topic Models
Hal Daumé III
David Blei
NeoRS: A Neonatal Resting State fMRI Data Preprocessing Pipeline
Vicente Enguix
Jeanette Kenley
David Luck
Gregory Anton Lodygensky
Resting state functional MRI (rsfMRI) has been shown to be a promising tool to study intrinsic brain functional connectivity and assess its … (see more)integrity in cerebral development. In neonates, where functional MRI is limited to very few paradigms, rsfMRI was shown to be a relevant tool to explore regional interactions of brain networks. However, to identify the resting state networks, data needs to be carefully processed to reduce artifacts compromising the interpretation of results. Because of the non-collaborative nature of the neonates, the differences in brain size and the reversed contrast compared to adults due to myelination, neonates can’t be processed with the existing adult pipelines, as they are not adapted. Therefore, we developed NeoRS, a rsfMRI pipeline for neonates. The pipeline relies on popular neuroimaging tools (FSL, AFNI, and SPM) and is optimized for the neonatal brain. The main processing steps include image registration to an atlas, skull stripping, tissue segmentation, slice timing and head motion correction and regression of confounds which compromise functional data interpretation. To address the specificity of neonatal brain imaging, particular attention was given to registration including neonatal atlas type and parameters, such as brain size variations, and contrast differences compared to adults. Furthermore, head motion was scrutinized, and motion management optimized, as it is a major issue when processing neonatal rsfMRI data. The pipeline includes quality control using visual assessment checkpoints. To assess the effectiveness of NeoRS processing steps we used the neonatal data from the Baby Connectome Project dataset including a total of 10 neonates. NeoRS was designed to work on both multi-band and single-band acquisitions and is applicable on smaller datasets. NeoRS also includes popular functional connectivity analysis features such as seed-to-seed or seed-to-voxel correlations. Language, default mode, dorsal attention, visual, ventral attention, motor and fronto-parietal networks were evaluated. Topology found the different analyzed networks were in agreement with previously published studies in the neonate. NeoRS is coded in Matlab and allows parallel computing to reduce computational times; it is open-source and available on GitHub (https://github.com/venguix/NeoRS). NeoRS allows robust image processing of the neonatal rsfMRI data that can be readily customized to different datasets.
Survival Modelling for Data From Combined Cohorts: Opening the Door to Meta Survival Analyses and Survival Analysis Using Electronic Health Records
James H. McVittie
Ana F. Best
David B. Wolfson
David A. Stephens
Julian Wolfson
Shahinaz M. Gadalla
Non‐parametric estimation of the survival function using observed failure time data depends on the underlying data generating mechanism, i… (see more)ncluding the ways in which the data may be censored and/or truncated. For data arising from a single source or collected from a single cohort, a wide range of estimators have been proposed and compared in the literature. Often, however, it may be possible, and indeed advantageous, to combine and then analyse survival data that have been collected under different study designs. We review non‐parametric survival analysis for data obtained by combining the most common types of cohort. We have two main goals: (i) to clarify the differences in the model assumptions and (ii) to provide a single lens through which some of the proposed estimators may be viewed. Our discussion is relevant to the meta‐analysis of survival data obtained from different types of study, and to the modern era of electronic health records.
Popular and Scientific Discourse on Autism: Representational Cross-Cultural Analysis of Epistemic Communities to Inform Policy and Practice
Christophe Gauld
Julien Maquet
Jean‐Arthur Micoulaud‐Franchi
Background Social media provide a window onto the circulation of ideas in everyday folk psychiatry, revealing the themes and issues discusse… (see more)d both by the public and by various scientific communities. Objective This study explores the trends in health information about autism spectrum disorder within popular and scientific communities through the systematic semantic exploration of big data gathered from Twitter and PubMed. Methods First, we performed a natural language processing by text-mining analysis and with unsupervised (machine learning) topic modeling on a sample of the last 10,000 tweets in English posted with the term #autism (January 2021). We built a network of words to visualize the main dimensions representing these data. Second, we performed precisely the same analysis with all the articles using the term “autism” in PubMed without time restriction. Lastly, we compared the results of the 2 databases. Results We retrieved 121,556 terms related to autism in 10,000 tweets and 5.7x109 terms in 57,121 biomedical scientific articles. The 4 main dimensions extracted from Twitter were as follows: integration and social support, understanding and mental health, child welfare, and daily challenges and difficulties. The 4 main dimensions extracted from PubMed were as follows: diagnostic and skills, research challenges, clinical and therapeutical challenges, and neuropsychology and behavior. Conclusions This study provides the first systematic and rigorous comparison between 2 corpora of interests, in terms of lay representations and scientific research, regarding the significant increase in information available on autism spectrum disorder and of the difficulty to connect fragments of knowledge from the general population. The results suggest a clear distinction between the focus of topics used in the social media and that of scientific communities. This distinction highlights the importance of knowledge mobilization and exchange to better align research priorities with personal concerns and to address dimensions of well-being, adaptation, and resilience. Health care professionals and researchers can use these dimensions as a framework in their consultations to engage in discussions on issues that matter to beneficiaries and develop clinical approaches and research policies in line with these interests. Finally, our study can inform policy makers on the health and social needs and concerns of individuals with autism and their caregivers, especially to define health indicators based on important issues for beneficiaries.
Conditions for indexability of restless bandits and an
$\mathcal{O}\!\left(K^3\right)$
algorithm to compute Whittle index
Nima Akbarzadeh
Abstract Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among sever… (see more)al alternative processes where the evolution of the processes depends on the resources allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach because of its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler-to-verify refinements of these conditions. We then revisit a previously proposed algorithm called the adaptive greedy algorithm which is known to compute the Whittle index for a sub-class of restless bandits. We show that a generalization of the adaptive greedy algorithm computes the Whittle index for all indexable restless bandits. We present an efficient implementation of this algorithm which can compute the Whittle index of a restless bandit with K states in
Clustering units in neural networks: upstream vs downstream information
Richard D Lange
Konrad Paul Kording
It has been hypothesized that some form of"modular"structure in artificial neural networks should be useful for learning, compositionality, … (see more)and generalization. However, defining and quantifying modularity remains an open problem. We cast the problem of detecting functional modules into the problem of detecting clusters of similar-functioning units. This begs the question of what makes two units functionally similar. For this, we consider two broad families of methods: those that define similarity based on how units respond to structured variations in inputs ("upstream"), and those based on how variations in hidden unit activations affect outputs ("downstream"). We conduct an empirical study quantifying modularity of hidden layer representations of simple feedforward, fully connected networks, across a range of hyperparameters. For each model, we quantify pairwise associations between hidden units in each layer using a variety of both upstream and downstream measures, then cluster them by maximizing their"modularity score"using established tools from network science. We find two surprising results: first, dropout dramatically increased modularity, while other forms of weight regularization had more modest effects. Second, although we observe that there is usually good agreement about clusters within both upstream methods and downstream methods, there is little agreement about the cluster assignments across these two families of methods. This has important implications for representation-learning, as it suggests that finding modular representations that reflect structure in inputs (e.g. disentanglement) may be a distinct goal from learning modular representations that reflect structure in outputs (e.g. compositionality).
Studying the Practices of Deploying Machine Learning Projects on Docker
Moses Openja
Forough Majidi
Bhagya Chembakottu
Heng Li
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
Samuel Sokota
Ryan D’orazio
J. Z. Kolter
Nicolas Loizou
Marc Lanctot
Noam Brown
Christian Kroer
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gra… (see more)dient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.