Publications

Universal Equivariant Multilayer Perceptrons
Group invariant and equivariant Multilayer Perceptrons (MLP), also known as Equivariant Networks, have achieved remarkable success in learni… (see more)ng on a variety of data structures, such as sequences, images, sets, and graphs. Using tools from group theory, this paper proves the universality of a broad class of equivariant MLPs with a single hidden layer. In particular, it is shown that having a hidden layer on which the group acts regularly is sufficient for universal equivariance (invariance). A corollary is unconditional universality of equivariant MLPs for Abelian groups, such as CNNs with a single hidden layer. A second corollary is the universality of equivariant MLPs with a high-order hidden layer, where we give both group-agnostic bounds and means for calculating group-specific bounds on the order of hidden layer that guarantees universal equivariance (invariance).
What can I do here? A Theory of Affordances in Reinforcement Learning
Gheorghe Comanici
David Abel
Reinforcement learning algorithms usually assume that all actions are always available to an agent. However, both people and animals underst… (see more)and the general link between the features of their environment and the actions that are feasible. Gibson (1977) coined the term "affordances" to describe the fact that certain states enable an agent to do certain actions, in the context of embodied agents. In this paper, we develop a theory of affordances for agents who learn and plan in Markov Decision Processes. Affordances play a dual role in this case. On one hand, they allow faster planning, by reducing the number of actions available in any given situation. On the other hand, they facilitate more efficient and precise learning of transition models from data, especially when such models require function approximation. We establish these properties through theoretical results as well as illustrative examples. We also propose an approach to learn affordances and use it to estimate transition models that are simpler and generalize better.
An Effective Anti-Aliasing Approach for Residual Networks
Cristina Vasconcelos
Nicolas Roux
Image pre-processing in the frequency domain has traditionally played a vital role in computer vision and was even part of the standard pipe… (see more)line in the early days of deep learning. However, with the advent of large datasets, many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself. Frequency aliasing is a phenomenon that may occur when sub-sampling any signal, such as an image or feature map, causing distortion in the sub-sampled output. We show that we can mitigate this effect by placing non-trainable blur filters and using smooth activation functions at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in out-of-distribution generalization on both image classification under natural corruptions on ImageNet-C [10] and few-shot learning on Meta-Dataset [17], without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.
Multiscale PHATE Exploration of SARS-CoV-2 Data Reveals Multimodal Signatures of Disease
Manik Kuchroo
Patrick Wong
Jean-Christophe Grenier
Dennis Shung
Carolina Lucas
Jon Klein
Daniel B. Burkhardt
Scott Gigante
Abhinav Godavarthi
Benjamin Israelow
Tianyang Mao
Ji Eun Oh
Julio Silva
Takehiro Takahashi
Camila D. Odio
Arnau Casanovas-Massana
John Fournier
Shelli Farhadian … (see 7 more)
Charles S. Dela Cruz
Albert I. Ko
F. Perry Wilson
Akiko Iwasaki
Abstract

The biomedical community is producing increasingly high dimensional datasets, integrated from hundreds of… (see more) patient samples, which current computational techniques struggle to explore. To uncover biological meaning from these complex datasets, we present an approach called Multiscale PHATE, which learns abstracted biological features from data that can be directly predictive of disease. Built on a coarse graining process called diffusion condensation, Multiscale PHATE learns a data topology that can be analyzed at coarse levels for high level summarizations of data, as well as at fine levels for detailed representations on subsets. We apply Multiscale PHATE to study the immune response to COVID-19 in 54 million cells from 168 hospitalized patients. Through our analysis of patient samples, we identify CD16-hi,CD66b-lo neutrophil and IFNγ+,GranzymeB+ Th17 cell responses enriched in patients who die. Furthermore, we show that population groupings Multiscale PHATE discovers can be directly fed into a classifier to predict disease outcome. We also use Multiscale PHATE-derived features to construct two different manifolds of patients, one from abstracted flow cytometry features and another directly on patient clinical features, both associating immune subsets and clinical markers with outcome.

Using Open Source Licensing to Regulate the Assembly of LAWS: A Preliminary Analysis
Cheng Lin
Lethal autonomous weapons (LAWS) are an emerging technology capable of automatically targeting and exercising lethal force. Many scholars an… (see more)d advocates have petitioned to ban the technology internationally for a myriad of reasons. However, there are practical challenges to implementing a ban. One such challenge is posed by the “intangible” nature of the software that LAWS depends on, which is incompatible with implementation mechanisms such as export control. Given the dual-use nature of software, and the fact that software is developed by teams of individuals, a number of soft governance mechanisms have been proposed to regulate this technology. In this paper, we investigate the feasibility of one particular approach: leveraging open source licenses as a means to prohibit the use of certain software in LAWS. This approach is largely motivated by the fact that open source software underpins all of technology, especially AI. Through a review of the recent tech activism and open source activism, we evaluate whether open source licenses can feasibly limit the use of open source software to only non-LAWS applications. We distill the current challenges facing “ethics-driven” open source licensing efforts into three main obstacles: the need for clarity of licensing language, the lack of enforceability of licenses, and the lack of cohesiveness of the open source community. We propose that addressing these factors are also success criteria for future anti-LAWS open source initiatives. We find that open source licenses provide more theoretical than practical promise in regulating LAWS, and conclude that cohesion in the open source community is the key to their potential practical success in the future.
Global Surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model.
Zhi Wen
Imane Chafi
Anya Okhmatovskaia
Guido Powell
David L. Buckeridge
As the COVID-19 pandemic continues to unfold, understanding the global impact of non-pharmacological interventions (NPI) is important for fo… (see more)rmulating effective intervention strategies, particularly as many countries prepare for future waves. We used a machine learning approach to distill latent topics related to NPI from large-scale international news media. We hypothesize that these topics are informative about the timing and nature of implemented NPI, dependent on the source of the information (e.g., local news versus official government announcements) and the target countries. Given a set of latent topics associated with NPI (e.g., self-quarantine, social distancing, online education, etc), we assume that countries and media sources have different prior distributions over these topics, which are sampled to generate the news articles. To model the source-specific topic priors, we developed a semi-supervised, multi-source, dynamic, embedded topic model. Our model is able to simultaneously infer latent topics and learn a linear classifier to predict NPI labels using the topic mixtures as input for each news article. To learn these models, we developed an efficient end-to-end amortized variational inference algorithm. We applied our models to news data collected and labelled by the World Health Organization (WHO) and the Global Public Health Intelligence Network (GPHIN). Through comprehensive experiments, we observed superior topic quality and intervention prediction accuracy, compared to the baseline embedded topic models, which ignore information on media source and intervention labels. The inferred latent topics reveal distinct policies and media framing in different countries and media sources, and also characterize reaction to COVID-19 and NPI in a semantically meaningful manner. Our PyTorch code is available on Github (htps://github.com/li-lab-mcgill/covid19_media).
On Posterior Collapse and Encoder Feature Dispersion in Sequence VAEs.
Teng Long
Yanshuai Cao
Jackie CK Cheung
Variational autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic … (see more)properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. In this paper, we argue that posterior collapse is in part caused by the lack of dispersion in encoder features. We provide empirical evidence to verify this hypothesis, and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing model to achieve significantly better data log-likelihood than standard sequence VAEs. Comparing to existing work, our proposed method is able to achieve comparable or superior performances while being more computationally efficient.
Approximate Planning and Learning for Partially Observed Systems
DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning
Timo Milbich
Samarth Sinha
Björn Ommer
Effectiveness of quarantine and testing to prevent COVID-19 transmission from arriving travelers
Russell Wa
David L Buckeridge
Explainability and Interpretability: Keys to Deep Medicine
Arash Shaban-Nejad
Martin Michalowski
David L Buckeridge
A Study of Policy Gradient on a Class of Exactly Solvable Models
Colin Daniels
Anna M. Brandenberger
Policy gradient methods are extensively used in reinforcement learning as a way to optimize expected return. In this paper, we explore the e… (see more)volution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain, whose transition probabilities are determined by the gradient of the distribution of the policy's value. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We construct a class of novel partially observable environments with controllable exploration difficulty, in which the value distribution, and hence the policy parameter evolution, can be derived analytically. Using these environments, we analyze the probabilistic convergence of policy gradient to different local maxima of the value function. To our knowledge, this is the first approach developed to analytically compute the landscape of policy gradient in POMDPs for a class of such environments, leading to interesting insights into the difficulty of this problem.