Publications

Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
R Devon Hjelm
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Yale Song
Contrastive learning relies on an assumption that positive pairs contain related views that share certain underlying information about an in… (see more)stance, e.g., patches of an image or co-occurring multimodal signals of a video. What if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we pro-pose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to ex-isting contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning bench-marks that exhibit a variety of real-world noise patterns.
Survival Modelling for Data From Combined Cohorts: Opening the Door to Meta Survival Analyses and Survival Analysis Using Electronic Health Records
James H. McVittie
Ana F. Best
David B. Wolfson
David A. Stephens
Julian Wolfson
David L. Buckeridge
Shahinaz M. Gadalla
Non‐parametric estimation of the survival function using observed failure time data depends on the underlying data generating mechanism, i… (see more)ncluding the ways in which the data may be censored and/or truncated. For data arising from a single source or collected from a single cohort, a wide range of estimators have been proposed and compared in the literature. Often, however, it may be possible, and indeed advantageous, to combine and then analyse survival data that have been collected under different study designs. We review non‐parametric survival analysis for data obtained by combining the most common types of cohort. We have two main goals: (i) to clarify the differences in the model assumptions and (ii) to provide a single lens through which some of the proposed estimators may be viewed. Our discussion is relevant to the meta‐analysis of survival data obtained from different types of study, and to the modern era of electronic health records.
Popular and Scientific Discourse on Autism: Representational Cross-Cultural Analysis of Epistemic Communities to Inform Policy and Practice
Christophe Gauld
Julien Maquet
Jean‐Arthur Micoulaud‐Franchi
Background Social media provide a window onto the circulation of ideas in everyday folk psychiatry, revealing the themes and issues discusse… (see more)d both by the public and by various scientific communities. Objective This study explores the trends in health information about autism spectrum disorder within popular and scientific communities through the systematic semantic exploration of big data gathered from Twitter and PubMed. Methods First, we performed a natural language processing by text-mining analysis and with unsupervised (machine learning) topic modeling on a sample of the last 10,000 tweets in English posted with the term #autism (January 2021). We built a network of words to visualize the main dimensions representing these data. Second, we performed precisely the same analysis with all the articles using the term “autism” in PubMed without time restriction. Lastly, we compared the results of the 2 databases. Results We retrieved 121,556 terms related to autism in 10,000 tweets and 5.7x109 terms in 57,121 biomedical scientific articles. The 4 main dimensions extracted from Twitter were as follows: integration and social support, understanding and mental health, child welfare, and daily challenges and difficulties. The 4 main dimensions extracted from PubMed were as follows: diagnostic and skills, research challenges, clinical and therapeutical challenges, and neuropsychology and behavior. Conclusions This study provides the first systematic and rigorous comparison between 2 corpora of interests, in terms of lay representations and scientific research, regarding the significant increase in information available on autism spectrum disorder and of the difficulty to connect fragments of knowledge from the general population. The results suggest a clear distinction between the focus of topics used in the social media and that of scientific communities. This distinction highlights the importance of knowledge mobilization and exchange to better align research priorities with personal concerns and to address dimensions of well-being, adaptation, and resilience. Health care professionals and researchers can use these dimensions as a framework in their consultations to engage in discussions on issues that matter to beneficiaries and develop clinical approaches and research policies in line with these interests. Finally, our study can inform policy makers on the health and social needs and concerns of individuals with autism and their caregivers, especially to define health indicators based on important issues for beneficiaries.
Conditions for indexability of restless bandits and an
$\mathcal{O}\!\left(K^3\right)$
algorithm to compute Whittle index
Abstract Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among sever… (see more)al alternative processes where the evolution of the processes depends on the resources allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach because of its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler-to-verify refinements of these conditions. We then revisit a previously proposed algorithm called the adaptive greedy algorithm which is known to compute the Whittle index for a sub-class of restless bandits. We show that a generalization of the adaptive greedy algorithm computes the Whittle index for all indexable restless bandits. We present an efficient implementation of this algorithm which can compute the Whittle index of a restless bandit with K states in
Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels
Sai Rajeswar
Tim Verbelen
Bart Dhoedt
Alexandre Lacoste
Reinforcement learning (RL) aims at autonomously performing complex tasks. To this end, a reward signal is used to steer the learning proces… (see more)s. While successful in many circumstances, the approach is typically data hungry, requiring large amounts of task-specific interaction between agent and environment to learn efficient behaviors. To alleviate this, unsupervised RL proposes to collect data through self-supervised interaction to accelerate task-specific adaptation. However, whether current unsupervised strategies lead to improved generalization capabilities is still unclear, more so when the input observations are high-dimensional. In this work, we advance the field by closing the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way. Our approach uses unsupervised exploration for collecting experience to pre-train a world model. Then, when fine-tuning for downstream tasks, the agent leverages the learned model and a hybrid planner to efficiently adapt for the given tasks, achieving comparable results to task-specific base-lines, while using 20x less data. We extensively evaluate our work, comparing several exploration methods and improving the fine-tuning process by studying the interactions between the learned components. Furthermore, we investigate the limitations of the pre-trained agent, gaining insights into how these influence the decision process and shedding light on new research directions.
Clustering units in neural networks: upstream vs downstream information
Richard D Lange
Konrad Paul Kording
It has been hypothesized that some form of"modular"structure in artificial neural networks should be useful for learning, compositionality, … (see more)and generalization. However, defining and quantifying modularity remains an open problem. We cast the problem of detecting functional modules into the problem of detecting clusters of similar-functioning units. This begs the question of what makes two units functionally similar. For this, we consider two broad families of methods: those that define similarity based on how units respond to structured variations in inputs ("upstream"), and those based on how variations in hidden unit activations affect outputs ("downstream"). We conduct an empirical study quantifying modularity of hidden layer representations of simple feedforward, fully connected networks, across a range of hyperparameters. For each model, we quantify pairwise associations between hidden units in each layer using a variety of both upstream and downstream measures, then cluster them by maximizing their"modularity score"using established tools from network science. We find two surprising results: first, dropout dramatically increased modularity, while other forms of weight regularization had more modest effects. Second, although we observe that there is usually good agreement about clusters within both upstream methods and downstream methods, there is little agreement about the cluster assignments across these two families of methods. This has important implications for representation-learning, as it suggests that finding modular representations that reflect structure in inputs (e.g. disentanglement) may be a distinct goal from learning modular representations that reflect structure in outputs (e.g. compositionality).
Studying the Practices of Deploying Machine Learning Projects on Docker
Moses Openja
Bhagya Chembakottu
Heng Li
Docker is a containerization service that allows for convenient deployment of websites, databases, applications' APIs, and machine learning … (see more)(ML) models with a few lines of code. Studies have recently explored the use of Docker for deploying general software projects with no specific focus on how Docker is used to deploy ML-based projects. In this study, we conducted an exploratory study to understand how Docker is being used to deploy ML-based projects. As the initial step, we examined the categories of ML-based projects that use Docker. We then examined why and how these projects use Docker, and the characteristics of the resulting Docker images. Our results indicate that six categories of ML-based projects use Docker for deployment, including ML Applications, MLOps/ AIOps, Toolkits, DL Frameworks, Models, and Documentation. We derived the taxonomy of 21 major categories representing the purposes of using Docker, including those specific to models such as model management tasks (e.g., testing, training). We then showed that ML engineers use Docker images mostly to help with the platform portability, such as transferring the software across the operating systems, runtimes such as GPU, and language constraints. However, we also found that more resources may be required to run the Docker images for building ML-based software projects due to the large number of files contained in the image layers with deeply nested directories. We hope to shed light on the emerging practices of deploying ML software projects using containers and highlight aspects that should be improved.
The distribution, ecology and predicted habitat use of the Critically Endangered angelshark (Squatina squatina) in coastal waters of Wales and the central Irish Sea
Joanna Barker
Jake Davies
Monika Goralczyk
Surshti Patel
John O'Connor
Jim Evans
Jackson Wesley Evans
Rowland Sharp
Matthew Gollock
Fenella R. Wood
Frank N. Wood
James Rosindell
Charlie Bartlett
Brett J. Garner
Dafydd Jones
D. J. Jones
Declan Quigley
Ben Wray
Billy Wray
Abstract The angelshark (Squatina squatina) has the northernmost range of any angel shark species, but there is limited information on its d… (see more)istribution, habitat use and ecology at higher latitudes. To address this, Angel Shark Project: Wales gathered 2231 S. squatina records and 142 anecdotal resources from fishers, coastal communities and archives. These spanned the coastal waters of Wales and the central Irish Sea and were dated from 1812 to 2020, with 97.62% of records within 11.1 km (6 nm) of the coast. Commercial, recreational and charter boat fishers provided the majority of S. squatina records (97.18%), with significantly more sightings from three decades (1970s, 1980s and 1990s) and in the months of September, June, August and July (in descending order). The coastal area between Bardsey Island and Strumble Head had the most S. squatina records (n = 1279), with notable concentrations also found in Carmarthen Bay, Conwy Bay and the Outer Severn Estuary. Species distribution models (SDM) identified four environmental variables that had significant influence on S. squatina distribution, depth, chlorophyll‐a concentration, sea surface temperature (SST) and salinity, and these varied between the quarters (Q) of the year. SDM model outputs predicted a larger congruous area of suitable habitat in Q3 (3176 km2) compared to Q2 (2051 km2), with suitability along the three glacial moraines (Sarn Badrig, Sarn‐y‐Bwch and Sarn Cynfelyn) strongly presented. Comparison of modelled environmental variables at the location of S. squatina records for each Q identified reductions in depth and salinity, and increases in chlorophyll‐a and SST when comparing Q2 or Q3 with Q1 or Q4. This shift may suggest S. squatina are making seasonal movements to shallow coastal waters in Q2 and Q3. This is supported by 23 anecdotal resources and may be driven by reproductive behaviour, as there were 85 records of S. squatina individuals ≤60 cm in the dataset, inferred as recently born or juvenile life‐history stages. The results have helped fill significant evidence gaps identified in the Wales Angelshark Action Plan and immediate next research steps are suggested.
Leveraging Integer Linear Programming to Learn Optimal Fair Rule Lists
Ulrich Matchi Aïvodji
Julien Ferry
Sébastien Gambs
Marie-José Huguet
Mohamed
Siala
On Neural Architecture Inductive Biases for Relational Tasks
Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generaliz… (see more)ation. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as we find in many intelligence tests. Recent work has explored how forcing relational representations to remain distinct from sensory representations, as it seems to be the case in the brain, can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by 'partitioned' representations of relations and sensory details, and how this inductive bias can help recompose learned relational structure in newly encountered settings. We introduce a simple architecture based on similarity scores which we name Compositional Relational Network (CoRelNet). Using this model, we investigate a series of inductive biases that ensure abstract relations are learned and represented distinctly from sensory data, and explore their effects on out-of-distribution generalization for a series of relational psychophysics tasks. We find that simple architectural choices can outperform existing models in out-of-distribution generalization. Together, these results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing out-of-distribution relational computations.
Few-shot Question Generation for Personalized Feedback in Intelligent Tutoring Systems
Muhammad Shayan
Robert Belfer
Iulian V. Serban
Ekaterina Kochmar
Sequential Density Estimation via NCWFAs Sequential Density Estimation via Nonlinear Continuous Weighted Finite Automata
Weighted finite automata (WFAs) have been widely applied in many fields. One of the classic problems for WFAs is probability distribution es… (see more)timation over sequences of discrete symbols. Although WFAs have been extended to deal with continuous input data, namely continuous WFAs (CWFAs), it is still unclear how to approximate density functions over sequences of continuous random variables using WFA-based models, due to the limitation on the expressiveness of the model as well as the tractability of approximating density functions via CWFAs. In this paper, we propose a nonlinear extension to the CWFA model to first improve its expressiveness, we refer to it as the nonlinear continuous WFAs (NCWFAs). Then we leverage the so-called RNADE method, which is a well-known density estimator based on neural networks, and propose the RNADE-NCWFA model. The RNADE-NCWFA model computes a density function by design. We show that this model is strictly more expressive than the Gaussian HMM model, which CWFA cannot approximate. Empirically, we conduct a synthetic experiment using Gaussian HMM generated data. We focus on evaluating the model's ability to estimate densities for sequences of varying lengths (longer length than the training data). We observe that our model performs the best among the compared baseline methods.