Publications

The Canadian VirusSeq Data Portal and Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology

Erin E. Gill

Baofeng Jia

Carmen Lia Murall

Raphaël Poujol

Muhammad Zohaib Anwar

Nithu Sara John

Justin Richardsson

Ashley Hobb

Abayomi S. Olabode

Alexandru Lepsa

Ana T. Duggan

Andrea D. Tyler

Arnaud N'Guessan

Atul Kachru

Brandon Chan

Catherine Yoshida

Christina K. Yung

David Bujold

Dusan Andric

Edmund Su … (voir 46 de plus)

Emma J. Griffiths

Gary Van Domselaar

Gordon W. Jolly

Heather K. E. Ward

Henrich Feher

Jared Baker

Jared T. Simpson

Jaser Uddin

Jiannis Ragoussis

Jon Eubank

Jörg H. Fritz

José Héctor Gálvez

Karen Fang

Kim Cullion

Leonardo Rivera

Linda Xiang

Matthew A. Croxen

Mitchell Shiell

Natalie Prystajecky

Pierre-Olivier Quirion

Rosita Bajari

Samantha Rich

Samira Mubareka

Sandrine Moreira

Scott Cain

Steven G. Sutcliffe

Susanne A. Kraemer

Yelizar Alturmessov

Yann Joly

Marc Fiume

Terrance P. Snutch

Cindy Bell

Catalina López-Correa

Julie G. Hussin

Jeffrey B. Joy

Caroline Colijn

Paul M. K. Gordon

William W. L. Hsiao

Art F. Y. Poon

Natalie C. Knox

Mélanie Courtot

Lincoln Stein

Sarah P. Otto

Guillaume Bourque

B. Jesse Shapiro

Fiona S. L. Brinkman

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform t… (voir plus)he public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN – VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This portal has been coupled with other resources, such as Viral AI, and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this portal (https://virusseq-dataportal.ca/), including its contextual data not available elsewhere, and the Duotang (https://covarr-net.github.io/duotang/duotang.html), a web platform that presents key genomic epidemiology and modelling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the portal (COVID-MVP, CoVizu), are all open source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

2024-09-30

Microbial Genomics (publié)

doi.org

The oneirogen hypothesis: modeling the hallucinatory effects of classical psychedelics in terms of replay-dependent plasticity mechanisms

Abstract Classical psychedelics induce complex visual hallucinations in humans, generating percepts that are co-herent at a … (voir plus)low level, but which have surreal, dream-like qualities at a high level. While there are many hypotheses as to how classical psychedelics could induce these effects, there are no concrete mechanistic models that capture the variety of observed effects in humans, while remaining consistent with the known pharmacological effects of classical psychedelics on neural circuits. In this work, we propose the “oneirogen hypothesis”, which posits that the perceptual effects of classical psychedelics are a result of their pharmacological actions inducing neural activity states that truly are more similar to dream-like states. We simulate classical psychedelics’ effects via manipulating neural network models trained on perceptual tasks with the Wake-Sleep algorithm. This established machine learning algorithm leverages two activity phases, a perceptual phase (wake) where sensory inputs are encoded, and a generative phase (dream) where the network internally generates activity consistent with stimulus-evoked responses. We simulate the action of psychedelics by partially shifting the model to the ‘Sleep’ state, which entails a greater influence of top-down connections, in line with the impact of psychedelics on apical dendrites. The effects resulting from this manipulation capture a number of experimentally observed phenomena including the emergence of hallucinations, increases in stimulus-conditioned variability, and large increases in synaptic plasticity. We further provide a number of testable predictions which could be used to validate or invalidate our oneirogen hypothesis.

2024-09-29

bioRxiv (accepté)

doi.org

Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision

Melanie Gaillochet

Christian Desrosiers

Hervé Lombaert

2024-09-27

Lecture Notes in Computer Science (publié)

doi.org

arxiv.org

Genetic Interplay Between White Matter Hyperintensities and Alzheimer's Disease: A Brain-Body Perspective

Manpreet Singh

Kimia Shafighi

Flavie E. Detcheverry

Fanta Dabo

Ikrame Housni

Sridar Narayanan

Sarah A. Gagliano Taliun

Danilo Bzdok

AmanPreet Badhwar

MRI-detected white matter hyperintensities (WMH) are often recognized as markers of cerebrovascular abnormalities and an index of vascular b… (voir plus)rain injury, and are frequently present in individuals with Alzheimer’s disease (AD). Given the emerging bidirectional communication between the brain-body axis in both WMHs and AD, it is important to understand their genetic underpinnings across the whole body. However, literature on this is scarce. We investigated the brain-body axis by breaking down heritability estimates of these phenotypes across the whole body, – i.e., partitioning heritability. Our aims were to identify genetic underpinnings specific to WMHs, and common between WMHs and AD, by assessing (a) the partitioned heritability of WMHs and AD across the brain-body axis with tissue-specific annotations, (b) the partitioned heritability of WMHs and AD across the brain-body axis with cell-specific annotations, and (c) the genes associated with WMHs and AD, and verifying their expression levels across the whole body. Our tissue-specific analysis revealed that WMH-associated SNPs were significantly enriched in tissues beyond the brain, namely liver, cardiovascular, and kidney – with liver being a common tissue enriched for both WMHs and AD. Our cell-specific analysis showed enrichment of vascular endothelial cells across the tissue types enriched for WMHs, highlighting their central role in the development of WMHs. Additionally, our gene-level analysis highlighted overlapping patterns of tissue enrichment for both WMHs and AD, and showed interactions between WMH and AD associated genes. Our findings provide new insights into the systemic influences potentially contributing to WMH pathology, in particular, multi-system endothelial disorder. We hope that our multisystemic genetic findings will stimulate future WMH-research into specific pathways across the brain-body axis.

2024-09-27

medRxiv (prépublication)

doi.org

Refining SARS-CoV-2 intra-host variation by leveraging large-scale sequencing data

Fatima Mostefai

Jean-Christophe Grenier

Raphaël Poujol

Julie Hussin

Understanding viral genome evolution during host infection is crucial for grasping viral diversity and evolution. Analyzing intra-host singl… (voir plus)e nucleotide variants (iSNVs) offers insights into new lineage emergence, which is important for predicting and mitigating future viral threats. Despite next-generation sequencing’s potential, challenges persist, notably sequencing artifacts leading to false iSNVs. We developed a workflow to enhance iSNV detection in large NGS libraries, using over 130 000 SARS-CoV-2 libraries to distinguish mutations from errors. Our approach integrates bioinformatics protocols, stringent quality control, and dimensionality reduction to tackle batch effects and improve mutation detection reliability. Additionally, we pioneer the application of the PHATE visualization approach to genomic data and introduce a methodology that quantifies how related groups of data points are represented within a two-dimensional space, enhancing clustering structure explanation based on genetic similarities. This workflow advances accurate intra-host mutation detection, facilitating a deeper understanding of viral diversity and evolution.

2024-09-27

NAR Genomics and Bioinformatics (publié)

doi.org

Longitudinal bi-criteria framework for assessing national healthcare responses to pandemic outbreaks

Adel Guitouni

Nabil Belacel

Loubna Benabbou

Belaid Moa

Munire Erman

Halim Abdul

2024-09-26

Scientific Reports (publié)

doi.org

Replication of a GWAS signal near
<i>HLA-DQA2</i>
with acute myeloid leukemia using a disease-only cohort and external population-based controls

Rose Laflamme

Véronique Lisi

Josée Hébert

Guy Sauvageau

Sébastien Lemieux

Vincent-Philippe Lavallee

Guillaume Lettre

Acute myeloid leukemia (AML) is the most common type of acute leukemia in adults. Its risk factors include rare and highly penetrant somatic… (voir plus) mutations. Genome-wide association studies (GWAS) have also identified four common inherited variants associated with AML risk, but these findings have not yet been confirmed in many independent datasets. Here, we performed a replication study with 567 AML cases from the Leucegene cohort and 1,865 controls from the population-based cohort CARTaGENE (CaG). Because genotypes were generated using different technologies in the two datasets (e.g. low- vs. high-coverage whole-genome sequencing), we applied stringent quality-control filters to minimize type I errors. We showed using data reduction methods (e.g. principal component analysis [PCA] and uniform manifold approximation and projection [UMAP]) that our approach successfully integrated the Leucegene and CaG genetic data. We replicated the association between cytogenetically normal (CN)-AML and rs3916765, a variant located near HLA-DQA2 (odds ratio [95% confidence interval] = 1.88 [1.21-2.93], P- value=0.005). The effect size of this association was stronger when we restricted the analyses to AML patients with NPM1 mutations (odds ratios >2.35). We found HLA- DOB to be the most significantly upregulated gene in Leucegene participants with the CN-AML protective A-allele at rs3916765. We further found that several HLA class II genes are also differentially expressed albeit at lower statistical significance. Our results confirm that a common genetic variant at the HLA locus associates with AML risk, providing new opportunities to improve disease prognosis and treatment.

2024-09-26

medRxiv (prépublication)

doi.org

CALE: Continuous Arcade Learning Environment

Jesse Farebrother

Pablo Samuel Castro

We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare … (voir plus)et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella), but adds support for continuous actions. This enables the benchmarking and evaluation of continuous-control agents (such as PPO [Schulman et al., 2017] and SAC [Haarnoja et al., 2018]) and value-based agents (such as DQN [Mnih et al., 2015] and Rainbow [Hessel et al., 2018]) on the same environment suite. We provide a series of open questions and research directions that CALE enables, as well as initial baseline results using Soft Actor-Critic. CALE is available as part of the ALE athttps://github.com/Farama-Foundation/Arcade-Learning-Environment.

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

doi.org

openreview.net

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

David LE MEUR

David Orlando Romero Mogrovejo

Chenyang Lyu

Haryo Akbarianto Wibowo

Teresa Lynn

Injy Hamed

Aditya Nanda Kishore Khandavally

Aishik Mandal

Alina Dragonetti

Artem Abzaliev

Atnafu Lambebo Tonja

Bontu Fufa Balcha

Chenxi Whitehouse

Christian Salamea-Palacios

Dan John Velasco

David Ifeoluwa Adelani

D. Meur

Emilio Villa Cueva

Fajri Koto

Fauzan Farooqui … (voir 57 de plus)

Frederico Belcavello

Ganzorig Batnasan

Gisela Vallejo

Gráinne Caulfield

Guido Ivetta

Haiyue Song

Henok Biadglign Ademtew

Hernán Maina

Holy Lovenia

Israel Abebe Azime

Jan Christian Blaise Cruz

Jay Gala

Jiahui Geng

Jesus-German Ortiz-Barajas

Jinheon Baek

Jocelyn Dunstan

Laura Alonso Alemany

Teresa Clifford

Kumaranage Ravindu Yasas Nagasinghe

Luciana Benotti

Luis Fernando D'Haro

Marcelo Viridiano

Marcos Estecha-Garitagoitia

Maria Camila Buitrago Cabrera

Mario Rodríguez-Cantelar

Mélanie Jouitteau

Mihail Minkov Mihaylov

Mohamed Fazli Mohamed Imam

Muhammad Farid Adilazuarda

Munkhjargal Gochoo

Munkh-Erdene Otgonbold

Naome Etori

Olivier NIYOMUGISHA

Paula Mónica Silva

Pranjal A Chitale

Raj Dabre

Rendi Chevi

Ruochen Zhang

Ryandito Diandaru

Samuel Cahyawijaya

Santiago Góngora

Soyeong Jeong

Sukannya Purkayastha

Tatsuki Kuribayashi

Thanmay Jayakumar

Tiago Timponi Torrent

Toqeer Ehsan

Vladimir Araujo

Yova Kementchedjhieva

Zara Burzo

Zheng Wei Lim

Zheng Xin Yong

Oana Ignat

Joan Nwatu

Rada Mihalcea

Thamar Solorio

Alham Fikri Aji

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (présentation orale)

doi.org

openreview.net

Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection

Charles Guille-escuret

Pierre-Andre Noel

Ioannis Mitliagkas

David Vázquez

Joao Monteiro

Improving the reliability of deployed machine learning systems often involves developing methods to detect out-of-distribution (OOD) inputs.… (voir plus) However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where systems encounter a wide variety of anomalous inputs. In this study, we categorize five distinct types of distribution shifts and critically evaluate the performance of recent OOD detection methods on each of them. We publicly release our benchmark under the name BROAD (Benchmarking Resilience Over Anomaly Diversity). Our findings reveal that while these methods excel in detecting unknown classes, their performance is inconsistent when encountering other types of distribution shifts. In other words, they only reliably detect unexpected inputs that they have been specifically designed to expect. As a first step toward broad OOD detection, we learn a generative model of existing detection scores with a Gaussian mixture. By doing so, we present an ensemble approach that offers a more consistent and comprehensive solution for broad OOD detection, demonstrating superior performance compared to existing methods. Our code to download BROAD and reproduce our experiments is publicly available.

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

doi.org

openreview.net

Learning Action and Reasoning-Centric Image Editing from Videos and Simulation

Dheeraj Vattikonda

Varun Jampani

Christopher Pal

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (spotlight)

openreview.net

LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation

Bowen Li

Zhaoyu Li

Qiwei Du

Jinqi Luo

Wenshan Wang

Yaqi Xie

Simon Stepputtis

Chen Wang

Katia P. Sycara

Pradeep Kumar Ravikumar

Alexander G. Gray

Xujie Si

Sebastian Scherer

Recent years have witnessed the rapid development of Neuro-Symbolic (NeSy) AI systems, which integrate symbolic reasoning into deep neural n… (voir plus)etworks. However, most of the existing benchmarks for NeSy AI fail to provide long-horizon reasoning tasks with complex multi-agent interactions. Furthermore, they are usually constrained by fixed and simplistic logical rules over limited entities, making them far from real-world complexities. To address these crucial gaps, we introduce LogiCity, the first simulator based on customizable first-order logic (FOL) for an urban-like environment with multiple dynamic agents. LogiCity models diverse urban elements using semantic and spatial concepts, such as

2024-09-25

Datasets and Benchmarks Track @ Neural Information Processing Systems (poster)

doi.org

openreview.net

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Publications