Publications

Privacy-Aware Compression for Federated Data Analysis
Kamalika Chaudhuri
Chuan Guo
Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed lo… (see more)w-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework are privacy, since user data is often sensitive, and compression, since the user devices have low network bandwidth. Prior work has addressed these challenges separately by combining standard compression algorithms with known privacy mechanisms. In this work, we take a holistic look at the problem and design a family of privacy-aware compression mechanisms that work for any given communication budget. We first propose a mechanism for transmitting a single real number that has optimal variance under certain conditions. We then show how to extend it to metric differential privacy for location privacy use-cases, as well as vectors, for application to federated learning. Our experiments illustrate that our mechanism can lead to better utility vs. compression trade-offs for the same privacy loss in a number of settings.
Privacy-aware compression for federated data analysis
Kamalika Chaudhuri
Chuan Guo
Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed lo… (see more)w-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework are privacy, since user data is often sensitive, and compression, since the user devices have low network bandwidth. Prior work has addressed these challenges separately by combining standard compression algorithms with known privacy mechanisms. In this work, we take a holistic look at the problem and design a family of privacy-aware compression mechanisms that work for any given communication budget. We first propose a mechanism for transmitting a single real number that has optimal variance under certain conditions. We then show how to extend it to metric differential privacy for location privacy use-cases, as well as vectors, for application to federated learning. Our experiments illustrate that our mechanism can lead to better utility vs. compression trade-offs for the same privacy loss in a number of settings.
Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL
Akram Erraqabi
Marlos C. Machado
Harry Zhao
Mingde Zhao
Sainbayar Sukhbaatar
Alessandro Lazaric
Ludovic Denoyer
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from… (see more) skill discovery to reward shaping. Recently, learning the Laplacian representation has been framed as the optimization of a temporally-contrastive objective to overcome its computational limitations in large (or continuous) state spaces. However, this approach requires uniform access to all states in the state space, overlooking the exploration problem that emerges during the representation learning process. In this work, we propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation. We do so by combining the representation learning with a skill-based covering policy, which provides a better training distribution to extend and refine the representation. We also show that a simple augmentation of the representation objective with the learned temporal abstractions improves dynamics-awareness and helps exploration. We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments. Finally, even if our method is not optimized for skill discovery, the learned skills can successfully solve difficult continuous navigation tasks with sparse rewards, where standard skill discovery approaches are no so effective.
Universal antigen encoding of T cell activation from high-dimensional cytokine dynamics
Sooraj R. Achar
François X. P. Bourassa
Thomas J. Rademaker
Angela Lee
Taisuke Kondo
Emanuel Salazar-Cavazos
John S. Davies
Naomi Taylor
Grégoire Altan-Bonnet
Human brain anatomy reflects separable genetic and environmental components of socioeconomic status
Hyeokmoon Kweon
Gökhan Aydogan
Alain Dagher
Christian C. Ruff
Gideon Nave
Martha J Farah
Philipp Koellinger
Recent studies report that socioeconomic status (SES) correlates with brain structure. Yet, such findings are variable and little is known a… (see more)bout underlying causes. We present a well-powered voxel-based analysis of grey matter volume (GMV) across levels of SES, finding many small SES effects widely distributed across the brain, including cortical, subcortical and cerebellar regions. We also construct a polygenic index of SES to control for the additive effects of common genetic variation related to SES, which attenuates observed SES-GMV relations, to different degrees in different areas. Remaining variance, which may be attributable to environmental factors, is substantially accounted for by body mass index, a marker for lifestyle related to SES. In sum, SES affects multiple brain regions through measurable genetic and environmental effects. One-sentence Summary Socioeconomic status is linked with brain anatomy through a varying balance of genetic and environmental influences.
Multi-tract multi-symptom relationships in pediatric concussion
Guido I Guberman
Sonja Stojanovski
Eman Nishat
Alain Ptito
Anne L Wheeler
Maxime Descoteaux
The heterogeneity of white matter damage and symptoms in concussions has been identified as a major obstacle to therapeutic innovation. In c… (see more)ontrast, the vast majority of diffusion MRI studies on concussion have traditionally employed group-comparison approaches. Such studies do not consider heterogeneity of damage and symptoms in concussion. To parse concussion heterogeneity, the present study combines diffusion MRI (dMRI) and multivariate statistics to investigate multi-tract multi-symptom relationships. Using dMRI data from a sample of 306 children ages 9 and 10 with a history of concussion from the Adolescent Brain Cognitive Development Study (ABCD study), we built connectomes weighted by classical and emerging diffusion measures. These measures were combined into two informative indices, the first capturing a mixture of patterns suggestive of microstructural complexity, the second representing almost exclusively axonal density. We deployed pattern-learning algorithms to jointly decompose these connectivity features and 19 behavioural measures that capture well-known symptoms of concussions. We found idiosyncratic symptom-specific multi-tract connectivity features, which would not be captured in traditional univariate analyses. Multivariable connectome-symptom correspondences were stronger than all single-tract/single-symptom associations. Multi-tract connectivity features were also expressed equally across different sociodemographic strata and their expression was not accounted for by injury-related variables. In a replication dataset, the expression of multi-tract connectivity features predicted adverse psychiatric outcomes after accounting for other psychopathology-related variables. By defining cross-demographic multi-tract multi-symptom relationships to parse concussion heterogeneity, the present study can pave the way for the development of improved stratification strategies that may contribute to the success of future clinical trials and the improvement of concussion management.
Learning Representations for New Sound Classes With Continual Self-Supervised Learning
Zhepei Wang
Xilin Jiang
Junkai Wu
Efthymios Tzinis
Paris Smaragdis
In this article, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framew… (see more)ork where the model can be updated without relying on labeled data. For this purpose, we propose adopting representation learning, where an encoder is trained using unlabeled data. This learning framework enables the study and implementation of a practically relevant use case where only a small amount of the labels is available in a continual learning context. We also make the empirical observation that a similarity-based representation learning method within this framework is robust to forgetting even if no explicit mechanism against forgetting is employed. We show that this approach obtains similar performance compared to several distillation-based continual learning methods when employed on self-supervised representation learning methods.
Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties
Elliot Paquette
Ben Adlam
Jeffrey Pennington
Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties
Elliot Paquette
Ben Adlam
Jeffrey Pennington
We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of stochastic gradient descent (SGD) on a … (see more)high-dimensional random least squares problem with
Learning active tactile perception through belief-space control
Jean-François Tremblay
Johanna Hansen
Francois Hogan
Robot operating in an open world can encounter novel objects with unknown physical properties, such as mass, friction, or size. It is desira… (see more)ble to be able to sense those property through contact-rich interaction, before performing downstream tasks with the objects. We propose a method for autonomously learning active tactile perception policies, by learning a generative world model leveraging a differentiable bayesian filtering algorithm, and designing an information- gathering model predictive controller. We test the method on three simulated tasks: mass estimation, height estimation and toppling height estimation. Our method is able to discover policies which gather information about the desired property in an intuitive manner.
Reconstruction of full-length LINE-1 progenitors from ancestral genomes
Laura F Campitelli
Isaac Yellan
Mihai Albu
Marjan Barazandeh
Zain M Patel
Timothy R Hughes
Reconstruction of full-length LINE-1 progenitors from ancestral genomes
Laura F. Campitelli
Isaac Yellan
Mihai Tudor Albu
Marjan Barazandeh
Zain M. Patel
T. Hughes
Abstract Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, wit… (see more)h 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks.