Publications

Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
MohammadReza Davari
Sudhir Mudur
Continual Learning research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic forgett… (see more)ing is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representation, is typically measured or observed by evaluating the decrease in old task performance. However, a model's representation can change without losing knowledge about prior tasks. In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that, through this lens, model representations trained without any explicit control for forgetting often experience small representation forgetting and can sometimes be comparable to methods which explicitly control for forgetting, especially in longer task sequences. We also show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning. Based on our results, we show that a simple yet competitive approach is to learn representations continually with standard supervised contrastive learning while constructing prototypes of class samples when queried on old samples.
Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning
Muawiz Sajjad Chaudhary
Christian Desrosiers
S Ebrahimi Kahou
Batch normalization is a staple of computer vision models, including those employed in few-shot learning. Batch nor-malization layers in con… (see more)volutional neural networks are composed of a normalization step, followed by a shift and scale of these normalized features applied via the per-channel trainable affine parameters
Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
R Devon Hjelm
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Yale Song
Contrastive learning relies on an assumption that positive pairs contain related views that share certain underlying information about an in… (see more)stance, e.g., patches of an image or co-occurring multimodal signals of a video. What if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we pro-pose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to ex-isting contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning bench-marks that exhibit a variety of real-world noise patterns.
Distinct trajectories in low-dimensional neural oscillation state space track dynamic decision-making in humans
Thomas Thiery
Pierre Rainville
Paul Cisek
The brain evolved to govern behavior in a dynamic world, in which pertinent information about choices is often in flux. Thus, the commitment… (see more) to an action choice must reflect a balance between monitoring that information and the necessity to act before opportunities are lost. Here, we investigate the mechanisms of dynamic decision-making in humans using low dimensional space representation of brain wide magnetoencephalography recordings. We show that the principal components (PCs) of alpha (9-13 Hz) and beta power (16-24 Hz) are involved in tracking sensory information evolving over time in the sensorimotor and visual cortex. We also found that alpha PCs reflect the commitment to a particular choice, while beta PCs reflect motor execution. Finally, higher frequency components in subcortical areas reflect the adjustment of speed- accuracy tradeoff policies. These results provide a new detailed characterization of the distributed oscillatory brain processes underlying dynamic decision-making in humans.
Survival Modelling for Data From Combined Cohorts: Opening the Door to Meta Survival Analyses and Survival Analysis Using Electronic Health Records
James H. McVittie
Ana F. Best
David B. Wolfson
David A. Stephens
Julian Wolfson
David L. Buckeridge
Shahinaz M. Gadalla
Non‐parametric estimation of the survival function using observed failure time data depends on the underlying data generating mechanism, i… (see more)ncluding the ways in which the data may be censored and/or truncated. For data arising from a single source or collected from a single cohort, a wide range of estimators have been proposed and compared in the literature. Often, however, it may be possible, and indeed advantageous, to combine and then analyse survival data that have been collected under different study designs. We review non‐parametric survival analysis for data obtained by combining the most common types of cohort. We have two main goals: (i) to clarify the differences in the model assumptions and (ii) to provide a single lens through which some of the proposed estimators may be viewed. Our discussion is relevant to the meta‐analysis of survival data obtained from different types of study, and to the modern era of electronic health records.
Popular and Scientific Discourse on Autism: Representational Cross-Cultural Analysis of Epistemic Communities to Inform Policy and Practice
Christophe Gauld
Julien Maquet
Jean‐Arthur Micoulaud‐Franchi
Background Social media provide a window onto the circulation of ideas in everyday folk psychiatry, revealing the themes and issues discusse… (see more)d both by the public and by various scientific communities. Objective This study explores the trends in health information about autism spectrum disorder within popular and scientific communities through the systematic semantic exploration of big data gathered from Twitter and PubMed. Methods First, we performed a natural language processing by text-mining analysis and with unsupervised (machine learning) topic modeling on a sample of the last 10,000 tweets in English posted with the term #autism (January 2021). We built a network of words to visualize the main dimensions representing these data. Second, we performed precisely the same analysis with all the articles using the term “autism” in PubMed without time restriction. Lastly, we compared the results of the 2 databases. Results We retrieved 121,556 terms related to autism in 10,000 tweets and 5.7x109 terms in 57,121 biomedical scientific articles. The 4 main dimensions extracted from Twitter were as follows: integration and social support, understanding and mental health, child welfare, and daily challenges and difficulties. The 4 main dimensions extracted from PubMed were as follows: diagnostic and skills, research challenges, clinical and therapeutical challenges, and neuropsychology and behavior. Conclusions This study provides the first systematic and rigorous comparison between 2 corpora of interests, in terms of lay representations and scientific research, regarding the significant increase in information available on autism spectrum disorder and of the difficulty to connect fragments of knowledge from the general population. The results suggest a clear distinction between the focus of topics used in the social media and that of scientific communities. This distinction highlights the importance of knowledge mobilization and exchange to better align research priorities with personal concerns and to address dimensions of well-being, adaptation, and resilience. Health care professionals and researchers can use these dimensions as a framework in their consultations to engage in discussions on issues that matter to beneficiaries and develop clinical approaches and research policies in line with these interests. Finally, our study can inform policy makers on the health and social needs and concerns of individuals with autism and their caregivers, especially to define health indicators based on important issues for beneficiaries.
Conditions for indexability of restless bandits and an
$\mathcal{O}\!\left(K^3\right)$
algorithm to compute Whittle index
Abstract Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among sever… (see more)al alternative processes where the evolution of the processes depends on the resources allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach because of its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler-to-verify refinements of these conditions. We then revisit a previously proposed algorithm called the adaptive greedy algorithm which is known to compute the Whittle index for a sub-class of restless bandits. We show that a generalization of the adaptive greedy algorithm computes the Whittle index for all indexable restless bandits. We present an efficient implementation of this algorithm which can compute the Whittle index of a restless bandit with K states in
Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels
Sai Rajeswar
Tim Verbelen
Bart Dhoedt
Alexandre Lacoste
Reinforcement learning (RL) aims at autonomously performing complex tasks. To this end, a reward signal is used to steer the learning proces… (see more)s. While successful in many circumstances, the approach is typically data hungry, requiring large amounts of task-specific interaction between agent and environment to learn efficient behaviors. To alleviate this, unsupervised RL proposes to collect data through self-supervised interaction to accelerate task-specific adaptation. However, whether current unsupervised strategies lead to improved generalization capabilities is still unclear, more so when the input observations are high-dimensional. In this work, we advance the field by closing the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way. Our approach uses unsupervised exploration for collecting experience to pre-train a world model. Then, when fine-tuning for downstream tasks, the agent leverages the learned model and a hybrid planner to efficiently adapt for the given tasks, achieving comparable results to task-specific base-lines, while using 20x less data. We extensively evaluate our work, comparing several exploration methods and improving the fine-tuning process by studying the interactions between the learned components. Furthermore, we investigate the limitations of the pre-trained agent, gaining insights into how these influence the decision process and shedding light on new research directions.
Clustering units in neural networks: upstream vs downstream information
Richard D Lange
Konrad Paul Kording
It has been hypothesized that some form of"modular"structure in artificial neural networks should be useful for learning, compositionality, … (see more)and generalization. However, defining and quantifying modularity remains an open problem. We cast the problem of detecting functional modules into the problem of detecting clusters of similar-functioning units. This begs the question of what makes two units functionally similar. For this, we consider two broad families of methods: those that define similarity based on how units respond to structured variations in inputs ("upstream"), and those based on how variations in hidden unit activations affect outputs ("downstream"). We conduct an empirical study quantifying modularity of hidden layer representations of simple feedforward, fully connected networks, across a range of hyperparameters. For each model, we quantify pairwise associations between hidden units in each layer using a variety of both upstream and downstream measures, then cluster them by maximizing their"modularity score"using established tools from network science. We find two surprising results: first, dropout dramatically increased modularity, while other forms of weight regularization had more modest effects. Second, although we observe that there is usually good agreement about clusters within both upstream methods and downstream methods, there is little agreement about the cluster assignments across these two families of methods. This has important implications for representation-learning, as it suggests that finding modular representations that reflect structure in inputs (e.g. disentanglement) may be a distinct goal from learning modular representations that reflect structure in outputs (e.g. compositionality).
Studying the Practices of Deploying Machine Learning Projects on Docker
Moses Openja
Bhagya Chembakottu
Heng Li
Docker is a containerization service that allows for convenient deployment of websites, databases, applications' APIs, and machine learning … (see more)(ML) models with a few lines of code. Studies have recently explored the use of Docker for deploying general software projects with no specific focus on how Docker is used to deploy ML-based projects. In this study, we conducted an exploratory study to understand how Docker is being used to deploy ML-based projects. As the initial step, we examined the categories of ML-based projects that use Docker. We then examined why and how these projects use Docker, and the characteristics of the resulting Docker images. Our results indicate that six categories of ML-based projects use Docker for deployment, including ML Applications, MLOps/ AIOps, Toolkits, DL Frameworks, Models, and Documentation. We derived the taxonomy of 21 major categories representing the purposes of using Docker, including those specific to models such as model management tasks (e.g., testing, training). We then showed that ML engineers use Docker images mostly to help with the platform portability, such as transferring the software across the operating systems, runtimes such as GPU, and language constraints. However, we also found that more resources may be required to run the Docker images for building ML-based software projects due to the large number of files contained in the image layers with deeply nested directories. We hope to shed light on the emerging practices of deploying ML software projects using containers and highlight aspects that should be improved.
Leveraging Integer Linear Programming to Learn Optimal Fair Rule Lists
Ulrich Matchi Aïvodji
Julien Ferry
Sébastien Gambs
Marie-José Huguet
Mohamed
Siala
Ageism and Artificial Intelligence: Protocol for a Scoping Review
Charlene H Chu
Kathleen Leslie
Jiamin Shi
Rune Nyrup
Andria Bianchi
Shehroz S Khan
S. A. Rahimi
Alexandra Lyn
Amanda Grenier
Background Artificial intelligence (AI) has emerged as a major driver of technological development in the 21st century, yet little attention… (see more) has been paid to algorithmic biases toward older adults. Objective This paper documents the search strategy and process for a scoping review exploring how age-related bias is encoded or amplified in AI systems as well as the corresponding legal and ethical implications. Methods The scoping review follows a 6-stage methodology framework developed by Arksey and O’Malley. The search strategy has been established in 6 databases. We will investigate the legal implications of ageism in AI by searching grey literature databases, targeted websites, and popular search engines and using an iterative search strategy. Studies meet the inclusion criteria if they are in English, peer-reviewed, available electronically in full text, and meet one of the following two additional criteria: (1) include “bias” related to AI in any application (eg, facial recognition) and (2) discuss bias related to the concept of old age or ageism. At least two reviewers will independently conduct the title, abstract, and full-text screening. Search results will be reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) reporting guideline. We will chart data on a structured form and conduct a thematic analysis to highlight the societal, legal, and ethical implications reported in the literature. Results The database searches resulted in 7595 records when the searches were piloted in November 2021. The scoping review will be completed by December 2022. Conclusions The findings will provide interdisciplinary insights into the extent of age-related bias in AI systems. The results will contribute foundational knowledge that can encourage multisectoral cooperation to ensure that AI is developed and deployed in a manner consistent with ethical values and human rights legislation as it relates to an older and aging population. We will publish the review findings in peer-reviewed journals and disseminate the key results with stakeholders via workshops and webinars. Trial Registration OSF Registries AMG5P; https://osf.io/amg5p International Registered Report Identifier (IRRID) DERR1-10.2196/33211