Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
In this work, we propose Bijective-Contrastive Estimation (BCE), a classification-based learning criterion for energy-based models. We gener… (voir plus)ate a collection of contrasting distributions using bijections, and solve all the classification problems between the original data distribution and the distributions induced by the bijections using a classifier parameterized by an energy model. We show that if the classification objective is minimized, the energy function will uniquely recover the data density up to a normalizing constant. This has the benefit of not having to explicitly specify a contrasting distribution, like noise contrastive estimation. Experimentally, we demonstrate that the proposed method works well on 2D synthetic datasets. We discuss the difficulty in high dimensional cases, and propose potential directions to explore for future work.
Autism spectrum disorder (ASD) is commonly understood as a network disorder, yet case-control analyses against typically-developing controls… (voir plus) (TD) have yielded somewhat inconsistent patterns of results. The current work was centered on a novel approach to profile functional network idiosyncrasy, the inter-individual variability in the association between functional network organization and brain anatomy, and we tested the hypothesis that idiosyncrasy contributes to connectivity alterations in ASD. Studying functional network idiosyncrasy in a multi-centric dataset with 157 ASD and 172 TD, our approach revealed higher idiosyncrasy in ASD in the default mode, somatomotor and attention networks together with reduced idiosyncrasy in the lateral temporal lobe. Idiosyncrasy was found to increase with age in both ASD and TD, and was significantly correlated with symptom severity in the former group. Association analysis with structural and molecular brain features indicated that patterns of functional network idiosyncrasy were not correlated with ASD-related cortical thickness alterations, but closely with the spatial expression patterns of intracortical ASD risk genes. In line with our main hypothesis, we could demonstrate that idiosyncrasy indeed plays a strong role in the manifestation of connectivity alterations that are measurable with conventional case-control designs and may, thus, be a principal driver of inconsistency in the autism connectomics literature. These findings support important interactions between the heterogeneity of individuals with an autism diagnosis and group-level functional signatures, and help to consolidate prior research findings on the highly variable nature of the functional connectome in ASD. Our study promotes idiosyncrasy as a potential individualized diagnostic marker of atypical brain network development.
The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant met… (voir plus)hodological developments from the field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits of the a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.
2020-12-16
2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (publié)
BACKGROUND
True evidence-informed decision making in public health relies on incorporating evidence from a number of sources in addition to… (voir plus) traditional scientific evidence. Lack of access to these types of data, as well as ease of use and interpretability of scientific evidence contribute to limited uptake of evidence-informed decision making in practice. An electronic evidence system that includes multiple sources of evidence and potentially novel computational processing approaches or artificial intelligence holds promise as a solution to overcoming barriers to evidence-informed decision making in public health.
OBJECTIVE
To understand the needs and preferences for an electronic evidence system among public health professionals in Canada.
METHODS
An invitation to participate in an anonymous online survey was distributed via listservs of two Canadian public health organizations. Eligible participants were English or French speaking individuals currently working in public health. The survey contained both multiple choice and open-ended questions about needs and preferences relevant to an electronic evidence system. Quantitative responses were analyzed to explore differences by public health role. Inductive and deductive analysis methods were used to code and interpret the qualitative data. Ethics review was not required by the host institution.
RESULTS
Respondents (n = 371) were heterogeneous, spanning organizations, positions, and areas of practice within public health. Nearly all (98.0%) respondents indicated that an electronic evidence system would support their work. Respondents had high preferences for local contextual data, research and intervention evidence, and information about human and financial resources. Qualitative analyses identified a number of concerns, needs, and suggestions for development of such a system. Concerns ranged from personal use of such a system, to the ability of their organization to use such a system. Identified needs spanned the different sources of evidence including local context, research and intervention evidence, and resources and tools. Additional suggestions were identified to improve system usability.
CONCLUSIONS
Canadian public health professionals have positive perceptions towards an electronic evidence system that would bring together evidence from the local context, scientific research, and resources. Elements were also identified to increase the usability of an electronic evidence system.
Social media trends are increasingly taking a significant role for the understanding of modern social dynamics. In this work, we take a look… (voir plus) at how the Twitter landscape gets constantly shaped by automatically generated content. Twitter bot activity can be traced via network abstractions which, we hypothesize, can be learned through state-of-the-art graph neural network techniques. We employ a large bot database, continuously updated by Twitter, to learn how likely is that a user is mentioned by a bot, as well as, for a hashtag. Thus, we model this likelihood as a link prediction task between the set of users and hashtags. Moreover, we contrast our results by performing similar experiments on a crawled data set of real users.
2020-12-12
LatinX in AI at Neural Information Processing Systems Conference 2020 (publié)
A fundamental task in data exploration is to extract simplified low dimensional representations that capture intrinsic geometry in data, esp… (voir plus)ecially for faithfully visualizing data in two or three dimensions. Common approaches to this task use kernel methods for manifold learning. However, these methods typically only provide an embedding of fixed input data and cannot extend to new data points. Autoencoders have also recently become popular for representation learning. But while they naturally compute feature extractors that are both extendable to new data and invertible (i.e., reconstructing original features from latent representation), they have limited capabilities to follow global intrinsic geometry compared to kernel-based manifold learning. We present a new method for integrating both approaches by incorporating a geometric regularization term in the bottleneck of the autoencoder. Our regularization, based on the diffusion potential distances from the recently-proposed PHATE visualization method, encourages the learned latent representation to follow intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension to new data and reconstruction of data in the original feature space from latent coordinates. We compare our approach with leading kernel methods and autoencoder models for manifold learning to provide qualitative and quantitative evidence of our advantages in preserving intrinsic structure, out of sample extension, and reconstruction. Our method is easily implemented for big-data applications, whereas other methods are limited in this regard.
2020-12-10
2020 IEEE International Conference on Big Data (Big Data) (publié)
The study of first-order optimization algorithms (FOA) typically starts with assumptions on the objective functions, most commonly smoothnes… (voir plus)s and strong convexity. These metrics are used to tune the hyperparameters of FOA. We introduce a class of perturbations quantified via a new norm, called *-norm. We show that adding a small perturbation to the objective function has an equivalently small impact on the behavior of any FOA, which suggests that it should have a minor impact on the tuning of the algorithm. However, we show that smoothness and strong convexity can be heavily impacted by arbitrarily small perturbations, leading to excessively conservative tunings and convergence issues. In view of these observations, we propose a notion of continuity of the metrics, which is essential for a robust tuning strategy. Since smoothness and strong convexity are not continuous, we propose a comprehensive study of existing alternative metrics which we prove to be continuous. We describe their mutual relations and provide their guaranteed convergence rates for the Gradient Descent algorithm accordingly tuned. Finally we discuss how our work impacts the theoretical understanding of FOA and their performances.
The SARS-CoV-2 pandemic is one of the greatest global medical and social challenges that have emerged in recent history. Human coronavirus s… (voir plus)trains discovered during previous SARS outbreaks have been hypothesized to pass from bats to humans using intermediate hosts, e.g. civets for SARS-CoV and camels for MERS-CoV. The discovery of an intermediate host of SARS-CoV-2 and the identification of specific mechanism of its emergence in humans are topics of primary evolutionary importance. In this study we investigate the evolutionary patterns of 11 main genes of SARS-CoV-2. Previous studies suggested that the genome of SARS-CoV-2 is highly similar to the horseshoe bat coronavirus RaTG13 for most of the genes and to some Malayan pangolin coronavirus (CoV) strains for the receptor binding (RB) domain of the spike protein. We provide a detailed list of statistically significant horizontal gene transfer and recombination events (both intergenic and intragenic) inferred for each of 11 main genes of the SARS-CoV-2 genome. Our analysis reveals that two continuous regions of genes S and N of SARS-CoV-2 may result from intragenic recombination between RaTG13 and Guangdong (GD) Pangolin CoVs. Statistically significant gene transfer-recombination events between RaTG13 and GD Pangolin CoV have been identified in region [1215–1425] of gene S and region [534–727] of gene N. Moreover, some statistically significant recombination events between the ancestors of SARS-CoV-2, RaTG13, GD Pangolin CoV and bat CoV ZC45-ZXC21 coronaviruses have been identified in genes ORF1ab, S, ORF3a, ORF7a, ORF8 and N. Furthermore, topology-based clustering of gene trees inferred for 25 CoV organisms revealed a three-way evolution of coronavirus genes, with gene phylogenies of ORF1ab, S and N forming the first cluster, gene phylogenies of ORF3a, E, M, ORF6, ORF7a, ORF7b and ORF8 forming the second cluster, and phylogeny of gene ORF10 forming the third cluster. The results of our horizontal gene transfer and recombination analysis suggest that SARS-CoV-2 could not only be a chimera virus resulting from recombination of the bat RaTG13 and Guangdong pangolin coronaviruses but also a close relative of the bat CoV ZC45 and ZXC21 strains. They also indicate that a GD pangolin may be an intermediate host of this dangerous virus.
The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model per… (voir plus)formance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlaps that occur between these corpora and the test instances in WSC-style tasks. We find that a large number of test instances overlap considerably with the pretraining corpora on which state-of-the-art models are trained, and that a significant drop in classification accuracy occurs when models are evaluated on instances with minimal overlap. Based on these results, we provide the WSC-Web dataset, consisting of over 60k pronoun disambiguation problems scraped from web data, being both the largest corpus to date, and having a significantly lower proportion of overlaps with current pretraining corpora.
2020-12-01
Proceedings of the 28th International Conference on Computational Linguistics (publié)