Publications

Your head is there to move you around: Goal-driven models of the primate dorsal pathway
Patrick J Mineault
Blake A Richards
Christopher C Pack
Neurons in the dorsal visual pathway of the mammalian brain are selective for motion stimuli, with the complexity of stimulus representation… (see more)s increasing along the hierarchy. This progression is similar to that of the ventral visual pathway, which is well characterized by artificial neural networks (ANNs) optimized for object recognition. In contrast, there are no image-computable models of the dorsal stream with comparable explanatory power. We hypothesized that the properties of dorsal stream neurons could be explained by a simple learning objective: the need for an organism to orient itself during self-motion. To test this hypothesis, we trained a 3D ResNet to predict an agent’s self-motion parameters from visual stimuli in a simulated environment. We found that the responses in this network accounted well for the selectivity of neurons in a large database of single-neuron recordings from the dorsal visual stream of non-human primates. In contrast, ANNs trained on an action recognition dataset through supervised or self-supervised learning could not explain responses in the dorsal stream, despite also being trained on naturalistic videos with moving objects. These results demonstrate that an ecologically relevant cost function can account for dorsal stream properties in the primate brain.
Bijective-Contrastive Estimation
In this work, we propose Bijective-Contrastive Estimation (BCE), a classification-based learning criterion for energy-based models. We gener… (see more)ate a collection of contrasting distributions using bijections, and solve all the classification problems between the original data distribution and the distributions induced by the bijections using a classifier parameterized by an energy model. We show that if the classification objective is minimized, the energy function will uniquely recover the data density up to a normalizing constant. This has the benefit of not having to explicitly specify a contrasting distribution, like noise contrastive estimation. Experimentally, we demonstrate that the proposed method works well on 2D synthetic datasets. We discuss the difficulty in high dimensional cases, and propose potential directions to explore for future work.
Functional idiosyncrasy has a shared topography with group-level connectivity alterations in autism
Oualid Benkarim
Casey Paquola
Bo-yong Park
Seok-Jun Hong
Jessica Royer
Reinder Vos de Wael
Sara Larivière
Sofie Valk
Laurent Mottron
Boris Bernhardt
Autism spectrum disorder (ASD) is commonly understood as a network disorder, yet case-control analyses against typically-developing controls… (see more) (TD) have yielded somewhat inconsistent patterns of results. The current work was centered on a novel approach to profile functional network idiosyncrasy, the inter-individual variability in the association between functional network organization and brain anatomy, and we tested the hypothesis that idiosyncrasy contributes to connectivity alterations in ASD. Studying functional network idiosyncrasy in a multi-centric dataset with 157 ASD and 172 TD, our approach revealed higher idiosyncrasy in ASD in the default mode, somatomotor and attention networks together with reduced idiosyncrasy in the lateral temporal lobe. Idiosyncrasy was found to increase with age in both ASD and TD, and was significantly correlated with symptom severity in the former group. Association analysis with structural and molecular brain features indicated that patterns of functional network idiosyncrasy were not correlated with ASD-related cortical thickness alterations, but closely with the spatial expression patterns of intracortical ASD risk genes. In line with our main hypothesis, we could demonstrate that idiosyncrasy indeed plays a strong role in the manifestation of connectivity alterations that are measurable with conventional case-control designs and may, thus, be a principal driver of inconsistency in the autism connectomics literature. These findings support important interactions between the heterogeneity of individuals with an autism diagnosis and group-level functional signatures, and help to consolidate prior research findings on the highly variable nature of the functional connectome in ASD. Our study promotes idiosyncrasy as a potential individualized diagnostic marker of atypical brain network development.
Phylogenetic Manifold Regularization: A semi-supervised approach to predict transcription factor binding sites
The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant m e… (see more)thodological d evelopments f rom t he field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits o f t he a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.
Graph Neural Networks Learn Twitter Bot Behaviour
Albert M. Orozco Camacho
Sacha Lévy
Social media trends are increasingly taking a significant role for the understanding of modern social dynamics. In this work, we take a look… (see more) at how the Twitter landscape gets constantly shaped by automatically generated content. Twitter bot activity can be traced via network abstractions which, we hypothesize, can be learned through state-of-the-art graph neural network techniques. We employ a large bot database, continuously updated by Twitter, to learn how likely is that a user is mentioned by a bot, as well as, for a hashtag. Thus, we model this likelihood as a link prediction task between the set of users and hashtags. Moreover, we contrast our results by performing similar experiments on a crawled data set of real users.
Extendable and invertible manifold learning with geometry regularized autoencoders
Andrés F. Duque
Kevin Moon
A fundamental task in data exploration is to extract simplified low dimensional representations that capture intrinsic geometry in data, esp… (see more)ecially for faithfully visualizing data in two or three dimensions. Common approaches to this task use kernel methods for manifold learning. However, these methods typically only provide an embedding of fixed input data and cannot extend to new data points. Autoencoders have also recently become popular for representation learning. But while they naturally compute feature extractors that are both extendable to new data and invertible (i.e., reconstructing original features from latent representation), they have limited capabilities to follow global intrinsic geometry compared to kernel-based manifold learning. We present a new method for integrating both approaches by incorporating a geometric regularization term in the bottleneck of the autoencoder. Our regularization, based on the diffusion potential distances from the recently-proposed PHATE visualization method, encourages the learned latent representation to follow intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension to new data and reconstruction of data in the original feature space from latent coordinates. We compare our approach with leading kernel methods and autoencoder models for manifold learning to provide qualitative and quantitative evidence of our advantages in preserving intrinsic structure, out of sample extension, and reconstruction. Our method is easily implemented for big-data applications, whereas other methods are limited in this regard.
Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya
Benjamin Akera
Bibek Aryal
Tenzing Chogyal Sherpa
Finu Shresta
Anthony Ortiz
Juan Lavista Ferres
M. Matin
Team Optimal Control of Coupled Major-Minor Subsystems with Mean-Field Sharing
Jalal Arabneydi
Historical and cross-disciplinary trends in the biological and social sciences reveal an accelerating adoption of advanced analytics
Taylor Bolt
Jason S. Nomi
Lucina Q. Uddin
Methods for data analysis in the biomedical, life and social sciences are developing at a rapid pace. At the same time, there is increasing … (see more)concern that education in quantitative methods is failing to adequately prepare students for contemporary research. These trends have led to calls for educational reform to undergraduate and graduate quantitative research method curricula. We argue that such reform should be based on data-driven insights into within- and cross-disciplinary use of research methods. Our survey of peer-reviewed literature screened ∼3.5 million openly available research articles to monitor the cross-disciplinary usage of research methods in the past decade. We applied data-driven text-mining analyses to the methods and materials section of a large subset of this corpus to identify method trends shared across disciplines, as well as those unique to each discipline. As a whole, usage of T -test, analysis of variance, and other classical regression-based methods has declined in the published literature over the past 10 years. Machine-learning approaches, such as artificial neural networks, have seen a significant increase in the total share of scientific publications. We find unique groupings of research methods associated with each biomedical, life and social science discipline, such as the use of structural equation modeling in psychology, survival models in oncology, and manifold learning in ecology. We discuss the implications of these findings for education in statistics and research methods, as well as within- and cross-disciplinary collaboration.
Horizontal gene transfer and recombination analysis of SARS-CoV-2 genes helps discover its close relatives and shed light on its origin
The SARS-CoV-2 pandemic is one of  the greatest  global medical and social challenges that have emerged in recent history. Human corona… (see more)virus strains discovered during previous SARS outbreaks have been hypothesized to pass from bats to humans using intermediate hosts, e.g. civets for SARS-CoV and camels for MERS-CoV. The discovery of an intermediate host of SARS-CoV-2 and the identification of specific mechanism of its emergence in humans are topics of primary evolutionary importance. In this study we investigate the evolutionary patterns of 11 main genes of SARS-CoV-2. Previous studies suggested that the genome of SARS-CoV-2 is highly similar to the horseshoe bat coronavirus RaTG13 for most of the genes and to some Malayan pangolin coronavirus (CoV) strains for the receptor binding (RB) domain of the spike protein. We provide a detailed list of statistically significant horizontal gene transfer and recombination events (both intergenic and intragenic) inferred for each of 11 main genes of the SARS-CoV-2 genome. Our analysis reveals that two continuous regions of genes S and N of SARS-CoV-2 may result from intragenic recombination between RaTG13 and Guangdong (GD) Pangolin CoVs. Statistically significant gene transfer-recombination events between RaTG13 and GD Pangolin CoV have been identified in region [1215–1425] of gene S and region [534–727] of gene N. Moreover, some statistically significant recombination events between the ancestors of SARS-CoV-2, RaTG13, GD Pangolin CoV and bat CoV ZC45-ZXC21 coronaviruses have been identified in genes ORF1ab, S, ORF3a, ORF7a, ORF8 and N. Furthermore, topology-based clustering of gene trees inferred for 25 CoV organisms revealed a three-way evolution of coronavirus genes, with gene phylogenies of ORF1ab, S and N forming the first cluster, gene phylogenies of ORF3a, E, M, ORF6, ORF7a, ORF7b and ORF8 forming the second cluster, and phylogeny of gene ORF10 forming the third cluster. The results of our horizontal gene transfer and recombination analysis suggest that SARS-CoV-2 could not only be a chimera virus resulting from recombination of the bat RaTG13 and Guangdong pangolin coronaviruses but also a close relative of the bat CoV ZC45 and ZXC21 strains. They also indicate that a GD pangolin may be an intermediate host of this dangerous virus. 
Intervention Design for Effective Sim2Real Transfer
The goal of this work is to address the recent success of domain randomization and data augmentation for the sim2real setting. We explain th… (see more)is success through the lens of causal inference, positioning domain randomization and data augmentation as interventions on the environment which encourage invariance to irrelevant features. Such interventions include visual perturbations that have no effect on reward and dynamics. This encourages the learning algorithm to be robust to these types of variations and learn to attend to the true causal mechanisms for solving the task. This connection leads to two key findings: (1) perturbations to the environment do not have to be realistic, but merely show variation along dimensions that also vary in the real world, and (2) use of an explicit invariance-inducing objective improves generalization in sim2sim and sim2real transfer settings over just data augmentation or domain randomization alone. We demonstrate the capability of our method by performing zero-shot transfer of a robot arm reach task on a 7DoF Jaco arm learning from pixel observations.
Quantitative Equational Reasoning
Giorgio Bacci
Radu Mardare
Gordon Plotkin
Equational logic is central to reasoning about programs.What is the right equational setting for reasoning about probabilistic programs? It … (see more)has been understood that instead of equivalence relations one should work with (pseudo)metrics in a probabilistic setting. However, it is not clear how this relates to equational reasoning. In recent work the notion of a quantitative equational logic was introduced and developed. This retains many of the features of ordinary logic but fits naturally with metric reasoning. The present chapter is an elementry introduction to this topic. In this setting one can define analogues of algebras and free algebras. It turns out that the Kantorovich (Wasserstein) metric emerges as a free construction from a simple quantitative equational theory. We give a couple of examples of quantitative analogues of familiar effects from programming language theory. We do not assume any background in equational logic or advanced category theory.