Peu importe la taille : démocratiser la découverte de protéines avec l'IA
Des chercheurs de Mila ont créé un puissant modèle de langage protéique à source ouverte plus compact et efficace afin de démocratiser la découverte de protéines.
La prochaine cohorte de notre programme, conçu pour fournir aux participant·e·s une compréhension fondamentale des technologies de l'IA, se déroulera à Ottawa les 28 et 29 novembre.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
GaMPEN: A Machine-learning Framework for Estimating Bayesian Posteriors of Galaxy Morphological Parameters
We introduce a novel machine-learning framework for estimating the Bayesian posteriors of morphological parameters for arbitrarily large num… (voir plus)bers of galaxies. The Galaxy Morphology Posterior Estimation Network (GaMPEN) estimates values and uncertainties for a galaxy’s bulge-to-total-light ratio (L B /L T ), effective radius (R e ), and flux (F). To estimate posteriors, GaMPEN uses the Monte Carlo Dropout technique and incorporates the full covariance matrix between the output parameters in its loss function. GaMPEN also uses a spatial transformer network (STN) to automatically crop input galaxy frames to an optimal size before determining their morphology. This will allow it to be applied to new data without prior knowledge of galaxy size. Training and testing GaMPEN on galaxies simulated to match z 0.25 galaxies in Hyper Suprime-Cam Wide g-band images, we demonstrate that GaMPEN achieves typical errors of 0.1 in L B /L T , 0.″17 (∼7%) in R e , and 6.3 × 104 nJy (∼1%) in F. GaMPEN's predicted uncertainties are well calibrated and accurate (5% deviation)—for regions of the parameter space with high residuals, GaMPEN correctly predicts correspondingly large uncertainties. We a
Functional magnetic resonance imaging (fMRI) data is collected in millions of noisy, redundant dimensions. To understand how different brain… (voir plus)s process the same stimulus, we aim to denoise the fMRI signal via a meaningful embedding space that captures the data's intrinsic structure as shared across brains. We assume that stimulus-driven responses share latent features common across subjects that are jointly discoverable. Previous approaches to this problem have relied on linear methods like principal component analysis and shared response modeling. We propose a neural network called MRMD-AE (manifold-regularized multiple- decoder, autoencoder) that learns a common embedding from multi-subject fMRI data while retaining the ability to decode individual responses. Our latent common space represents an extensible manifold (where untrained data can be mapped) and improves classification accuracy of stimulus features of unseen timepoints, as well as cross-subject translation of fMRI signals.
2022-08-22
2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP) (publié)
Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by difficulties in social communication, but also great heter… (voir plus)ogeneity. To offer individualized medicine approaches, we need to better target interventions by stratifying autistic people into subgroups with different biological profiles and/or prognoses. We sought to validate neural responses to faces as a potential stratification factor in ASD by measuring neural (electroencephalography) responses to faces (critical in social interaction) in N = 436 children and adults with and without ASD. The speed of early-stage face processing (N170 latency) was on average slower in ASD than in age-matched controls. In addition, N170 latency was associated with responses to faces in the fusiform gyrus, measured with functional magnetic resonance imaging, and polygenic scores for ASD. Within the ASD group, N170 latency predicted change in adaptive socialization skills over an 18-month follow-up period; data-driven clustering identified a subgroup with slower brain responses and poor social prognosis. Use of a distributional data-driven cutoff was associated with predicted improvements of power in simulated clinical trials targeting social functioning. Together, the data provide converging evidence for the utility of the N170 as a stratification factor to identify biologically and prognostically defined subgroups in ASD. Description N170 latency to faces relates to fusiform activity and ASD genetics, predicts social prognosis, and could improve power in clinical trials. Exploiting face processing in patients with ASD The heterogeneity observed in patients with autism spectrum disorder (ASD) highlights the need for better patient stratification methods. Here, Mason et al. evaluated the use of the speed of early-stage face processing (N170 latency) for patient stratification and prognosis in subjects with ASD and age-matched healthy individuals. N170 latency was slower in individuals with ASD and correlated with response to faces measured with fMRI and with polygenic risk score. Among subjects with ASD, the N170 values stratified patients according to socialization prognosis and improved power in a simulated clinical trial. The results suggest that including N170 evaluation in patient stratification might help the design and development of patient-specific therapies for ASD.
Discovering what is learned by neural networks remains a challenge. In self-supervised learning, classification is the most common task used… (voir plus) to evaluate how good a representation is. However, relying only on such downstream task can limit our understanding of what information is retained in the representation of a given input. In this work, we showcase the use of a Representation Conditional Diffusion Model (RCDM) to visualize in data space the representations learned by self-supervised models. The use of RCDM is motivated by its ability to generate high-quality samples -- on par with state-of-the-art generative models -- while ensuring that the representations of those samples are faithful i.e. close to the one used for conditioning. By using RCDM to analyze self-supervised models, we are able to clearly show visually that i) SSL (backbone) representation are not invariant to the data augmentations they were trained with -- thus debunking an often restated but mistaken belief; ii) SSL post-projector embeddings appear indeed invariant to these data augmentation, along with many other data symmetries; iii) SSL representations appear more robust to small adversarial perturbation of their inputs than representations trained in a supervised manner; and iv) that SSL-trained representations exhibit an inherent structure that can be explored thanks to RCDM visualization and enables image manipulation.
Electronic health records (EHRs) provide rich clinical information and the opportunities to extract epidemiological patterns to understand a… (voir plus)nd predict patient disease risks with suitable machine learning methods such as topic models. However, existing topic models do not generate identifiable topics each predicting a unique phenotype. One promising direction is to use known phenotype concepts to guide topic inference. We present a seed-guided Bayesian topic model called MixEHR-Seed with 3 contributions: (1) for each phenotype, we infer a dual-form of topic distribution: a seed-topic distribution over a small set of key EHR codes and a regular topic distribution over the entire EHR vocabulary; (2) we model age-dependent disease progression as Markovian dynamic topic priors; (3) we infer seed-guided multi-modal topics over distinct EHR data types. For inference, we developed a variational inference algorithm. Using MixEHR-Seed, we inferred 1569 PheCode-guided phenotype topics from an EHR database in Quebec, Canada covering 1.3 million patients for up to 20-year follow-up with 122 million records for 8539 and 1126 unique diagnostic and drug codes, respectively. We observed (1) accurate phenotype prediction by the guided topics, (2) clinically relevant PheCode-guided disease topics, (3) meaningful age-dependent disease prevalence. Source code is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Seed.
In many clinical contexts, detecting all lesions is imperative for evaluating disease activity. Standard approaches pose lesion detection as… (voir plus) a segmentation problem despite the time-consuming nature of acquiring segmentation labels. In this paper, we present a lesion detection method which relies only on point labels. Our model, which is trained via heatmap regression, can detect a variable number of lesions in a probabilistic manner. In fact, our proposed post-processing method offers a reliable way of directly estimating the lesion existence uncertainty. Experimental results on Gad lesion detection show our point-based method performs competitively compared to training on expensive segmentation labels. Finally, our detection model provides a suitable pre-training for segmentation. When fine-tuning on only 17 segmentation samples, we achieve comparable performance to training with the full dataset.
Background: Understanding the relationship between the Omics and the phenotype is a central problem in precision medicine. The high dimensio… (voir plus)nality of metabolomics data challenges learning algorithms in terms of scalability and generalization. Most learning algorithms do not produce interpretable models -- Method: We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules. -- Results : Applications on metabolomics data shows that it produces models that achieves high predictive performances. The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.