The Mila AI Policy Fellowship translates deep AI expertise into rigorous, public-interest policy. Read the newest publication Bridging the Expertise Gap: Knowledge Transfer Mechanisms for AI Regulation by Moritz von Knebel
This program supports AI startups at any time of the year. Benefit from cutting-edge resources and tailored support to accelerate your technology's development.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
OBJECTIVE
Copy number variants (CNVs) are well-known genetic pleiotropic risk factors for multiple neurodevelopmental and psychiatric disord… (see more)ers (NPDs), including autism (ASD) and schizophrenia. Little is known about how different CNVs conferring risk for the same condition may affect subcortical brain structures and how these alterations relate to the level of disease risk conferred by CNVs. To fill this gap, the authors investigated gross volume, vertex-level thickness, and surface maps of subcortical structures in 11 CNVs and six NPDs.
METHODS
Subcortical structures were characterized using harmonized ENIGMA protocols in 675 CNV carriers (CNVs at 1q21.1, TAR, 13q12.12, 15q11.2, 16p11.2, 16p13.11, and 22q11.2; age range, 6-80 years; 340 males) and 782 control subjects (age range, 6-80 years; 387 males) as well as ENIGMA summary statistics for ASD, schizophrenia, attention deficit hyperactivity disorder, obsessive-compulsive disorder, bipolar disorder, and major depression.
RESULTS
All CNVs showed alterations in at least one subcortical measure. Each structure was affected by at least two CNVs, and the hippocampus and amygdala were affected by five. Shape analyses detected subregional alterations that were averaged out in volume analyses. A common latent dimension was identified, characterized by opposing effects on the hippocampus/amygdala and putamen/pallidum, across CNVs and across NPDs. Effect sizes of CNVs on subcortical volume, thickness, and local surface area were correlated with their previously reported effect sizes on cognition and risk for ASD and schizophrenia.
CONCLUSIONS
The findings demonstrate that subcortical alterations associated with CNVs show varying levels of similarities with those associated with neuropsychiatric conditions, as well distinct effects, with some CNVs clustering with adult-onset conditions and others with ASD. These findings provide insight into the long-standing questions of why CNVs at different genomic loci increase the risk for the same NPD and why a single CNV increases the risk for a diverse set of NPDs.
Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the under… (see more)lying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM's utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).
2023-07-02
Proceedings of the 40th International Conference on Machine Learning (published)
While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (see more)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel distance between distributions and signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called "gene localization score" which helps select genes for cellular state space characterization.
Due to commonalities in pathophysiology, age-related macular degeneration (AMD) represents a uniquely accessible model to investigate thera… (see more)pies for neurodegenerative diseases, leading us to examine whether pathways of disease progression are shared across neurodegenerative conditions. Here we use single-nucleus RNA sequencing to profile lesions from 11 postmortem human retinas with age-related macular degeneration and 6 control retinas with no history of retinal disease. We create a machine-learning pipeline based on recent advances in data geometry and topology and identify activated glial populations enriched in the early phase of disease. Examining single-cell data from Alzheimer’s disease and progressive multiple sclerosis with our pipeline, we find a similar glial activation profile enriched in the early phase of these neurodegenerative diseases. In late-stage age-related macular degeneration, we identify a microglia-to-astrocyte signaling axis mediated by interleukin-1β which drives angiogenesis characteristic of disease pathogenesis. We validated this mechanism using in vitro and in vivo assays in mouse, identifying a possible new therapeutic target for AMD and possibly other neurodegenerative conditions. Thus, due to shared glial states, the retina provides a potential system for investigating therapeutic approaches in neurodegenerative diseases.
Asymmetry between the left and right brain is a key feature of brain organization. Hemispheric functional specialization underlies some of t… (see more)he most advanced human-defining cognitive operations, such as articulated language, perspective taking, or rapid detection of facial cues. Yet, genetic investigations into brain asymmetry have mostly relied on common variant studies, which typically exert small effects on brain phenotypes. Here, we leverage rare genomic deletions and duplications to study how genetic alterations reverberate in human brain and behavior. We quantitatively dissected the impact of eight high-effect-size copy number variations (CNVs) on brain asymmetry in a multi-site cohort of 552 CNV carriers and 290 non-carriers. Isolated multivariate brain asymmetry patterns spotlighted regions typically thought to subserve lateralized functions, including language, hearing, as well as visual, face and word recognition. Planum temporale asymmetry emerged as especially susceptible to deletions and duplications of specific gene sets. Targeted analysis of common variants through genome-wide association study (GWAS) consolidated partly diverging genetic influences on the right versus left planum temporale structure. In conclusion, our gene-brain-behavior mapping highlights the consequences of genetically controlled brain lateralization on human-defining cognitive traits.
Copy number variations (CNVs) are rare genomic deletions and duplications that can affect brain and behaviour. Previous reports of CNV pleio… (see more)tropy imply that they converge on shared mechanisms at some level of pathway cascades, from genes to large-scale neural circuits to the phenome. However, existing studies have primarily examined single CNV loci in small clinical cohorts. It remains unknown, for example, how distinct CNVs escalate vulnerability for the same developmental and psychiatric disorders. Here we quantitatively dissect the associations between brain organization and behavioural differentiation across 8 key CNVs. In 534 CNV carriers, we explored CNV-specific brain morphology patterns. CNVs were characteristic of disparate morphological changes involving multiple large-scale networks. We extensively annotated these CNV-associated patterns with ~1,000 lifestyle indicators through the UK Biobank resource. The resulting phenotypic profiles largely overlap and have body-wide implications, including the cardiovascular, endocrine, skeletal and nervous systems. Our population-level investigation established brain structural divergences and phenotypical convergences of CNVs, with direct relevance to major brain disorders.
Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods a… (see more)re currently the state-of-the-art for such computations, but require
2022-12-31
2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP) (published)
Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensio… (see more)nal, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. This novel perspective makes clearer the choices available in manifold learning and denoising. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances, and preserving cluster structure in toy datasets. We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure, where our method enables interpolation of withheld timepoints of data. Finally, we show that parameters of our more general method can be configured to give results similar to PHATE (a state-of-the-art diffusion based manifold learning method) as well as SNE (an attraction/repulsion neighborhood based method that forms the basis of t-SNE).
2022-12-31
Advances in Neural Information Processing Systems (published)
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (see more)s in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover’s distance (EMD) with a geodesic cost over the underlying graph. Typically, EMD is computed by optimizing over the cost of transporting one probability distribution to another over an underlying metric space. However, this is inefficient when computing the EMD between many signals. Here, we propose an unbalanced graph EMD that efficiently embeds the unbalanced EMD on an underlying graph into an L(1) space, whose metric we call unbalanced diffusion earth mover’s distance (UDEMD). Next, we show how this gives distances between graph signals that are robust to noise. Finally, we apply this to organizing patients based on clinical notes, embedding cells modeled as signals on a gene graph, and organizing genes modeled as signals over a large cell graph. In each case, we show that UDEMD-based embeddings find accurate distances that are highly efficient compared to other methods.
2022-04-26
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (see more)s in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying graph. Typically, EMD is computed by optimizing over the cost of transporting one probability distribution to another over an underlying metric space. However, this is inefficient when computing the EMD between many signals. Here, we propose an unbalanced graph EMD that efficiently embeds the unbalanced EMD on an underlying graph into an
2021-12-31
IEEE International Conference on Acoustics, Speech and Signal Processing (unknown)