Publications

Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks

Mastering complex sequential tasks continues to pose a significant challenge in robotics. While there has been progress in learning long-hor… (see more)izon manipulation tasks, most existing approaches lack rigorous mathematical guarantees for ensuring reliable and successful execution. In this paper, we extend previous work on learning long-horizon tasks and stable policies, focusing on improving task success rates while reducing the amount of training data needed. Our approach introduces a novel method that (1) segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals, and (2) learns globally stable dynamical system policies to guide the robot to each subgoal, even in the face of sensory noise and random disturbances. We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms. Code is available at https://github.com/Alestaubin/stable-imitation-policy-with-waypoints

2024-09-30

ArXiv (preprint)

doi.org

arxiv.org

SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets

Toby Dylan Hocking

Gabrielle Thibault

C. S. Bodine

Paul Nelson Arellano

Alexander F. Shenkin

Olivia Jasmine Lindly

In many real-world applications of machine learning, we are interested to know if it is possible to train on the data that we have gathered … (see more)so far, and obtain accurate predictions on a new test data subset that is qualitatively different in some respect (time period, geographic region, etc). Another question is whether data subsets are similar enough so that it is beneficial to combine subsets during model training. We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions. SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets. We show results of using SOAK on six new real data sets (with geographic/temporal subsets, to check if predictions are accurate on new subsets), 3 image pair data sets (subsets are different image types, to check that we get smaller prediction error on similar images), and 11 benchmark data sets with predefined train/test splits (to check similarity of predefined splits).

2024-09-30

arXiv (published)

doi.org

arxiv.org

Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition

Soufiane Belharbi

Marco Pedersoli

Alessandro Lameiras Koerich

Simon Bacon

Eric Granger

Although state-of-the-art classifiers for facial expression recognition (FER) can achieve a high level of accuracy, they lack interpretabili… (see more)ty, an important feature for end-users. Experts typically associate spatial action units (AUs) from a codebook to facial regions for the visual interpretation of expressions. In this paper, the same expert steps are followed. A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models. During training, this AU codebook is used, along with the input image expression label, and facial landmarks, to construct a AU heatmap that indicates the most discriminative image regions of interest w.r.t the facial expression. This valuable spatial cue is leveraged to train a deep interpretable classifier for FER. This is achieved by constraining the spatial layer features of a classifier to be correlated with AU heatmaps. Using a composite loss, the classifier is trained to correctly classify an image while yielding interpretable visual layer-wise attention correlated with AU maps, simulating the expert decision process. Our strategy only relies on image class expression for supervision, without additional manual annotations. Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time. Our extensive evaluation on two public benchmarks RAF-DB, and AffectNet datasets shows that our proposed strategy can improve layer-wise interpretability without degrading classification performance. In addition, we explore a common type of interpretable classifiers that rely on class activation mapping (CAM) methods, and show that our approach can also improve CAM interpretability.

2024-09-30

arXiv (published)

doi.org

arxiv.org

A Survey of Diversification Techniques in Search and Recommendation

Haolun Wu

Yansen Zhang

Chen Ma

Fuyuan Lyu

Bowei He

Fernando Diaz

Bhaskar Mitra

Xue Liu

Diversifying search results is an important research topic in retrieval systems in order to satisfy both the various interests of customers … (see more)and the equal market exposure of providers. There has been a growing attention on diversity-aware research during recent years, accompanied by a proliferation of literature on methods to promote diversity in search and recommendation. However, the diversity-aware studies in retrieval systems lack a systematic organization and are rather fragmented. In this survey, we are the first to propose a unified taxonomy for classifying the metrics and approaches of diversification in both search and recommendation, which are two of the most extensively researched fields of retrieval systems. We begin the survey with a brief discussion of why diversity is important in retrieval systems, followed by a summary of the various diversity concerns in search and recommendation, highlighting their relationship and differences. For the survey’s main body, we present a unified taxonomy of diversification metrics and approaches in retrieval systems, from both the search and recommendation perspectives. In the later part of the survey, we discuss the openness research questions of diversity-aware research in search and recommendation in an effort to inspire future innovations and encourage the implementation of diversity in real-world systems.

2024-09-30

IEEE Transactions on Knowledge and Data Engineering (published)

doi.org

arxiv.org

The Canadian VirusSeq Data Portal and Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology

Erin E. Gill

Baofeng Jia

Carmen Lia Murall

Raphaël Poujol

Muhammad Zohaib Anwar

Nithu Sara John

Justin Richardsson

Ashley Hobb

Abayomi S. Olabode

Alexandru Lepsa

Ana T. Duggan

Andrea D. Tyler

Arnaud N'Guessan

Atul Kachru

Brandon Chan

Catherine Yoshida

Christina K. Yung

David Bujold

Dusan Andric

Edmund Su … (see 46 more)

Emma J. Griffiths

Gary Van Domselaar

Gordon W. Jolly

Heather K. E. Ward

Henrich Feher

Jared Baker

Jared T. Simpson

Jaser Uddin

Jiannis Ragoussis

Jon Eubank

Jörg H. Fritz

José Héctor Gálvez

Karen Fang

Kim Cullion

Leonardo Rivera

Linda Xiang

Matthew A. Croxen

Mitchell Shiell

Natalie Prystajecky

Pierre-Olivier Quirion

Rosita Bajari

Samantha Rich

Samira Mubareka

Sandrine Moreira

Scott Cain

Steven G. Sutcliffe

Susanne A. Kraemer

Yelizar Alturmessov

Yann Joly

Marc Fiume

Terrance P. Snutch

Cindy Bell

Catalina Lopez-Correa

Julie G. Hussin

Jeffrey B. Joy

Caroline Colijn

Paul M. K. Gordon

William W. L. Hsiao

Art F. Y. Poon

Natalie C. Knox

Mélanie Courtot

Lincoln Stein

Sarah P. Otto

Guillaume Bourque

B. Jesse Shapiro

Fiona S. L. Brinkman

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform t… (see more)he public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN – VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This portal has been coupled with other resources, such as Viral AI, and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this portal (https://virusseq-dataportal.ca/), including its contextual data not available elsewhere, and the Duotang (https://covarr-net.github.io/duotang/duotang.html), a web platform that presents key genomic epidemiology and modelling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the portal (COVID-MVP, CoVizu), are all open source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

2024-09-30

Microbial Genomics (published)

doi.org

The oneirogen hypothesis: modeling the hallucinatory effects of classical psychedelics in terms of replay-dependent plasticity mechanisms

Abstract Classical psychedelics induce complex visual hallucinations in humans, generating percepts that are co-herent at a … (see more)low level, but which have surreal, dream-like qualities at a high level. While there are many hypotheses as to how classical psychedelics could induce these effects, there are no concrete mechanistic models that capture the variety of observed effects in humans, while remaining consistent with the known pharmacological effects of classical psychedelics on neural circuits. In this work, we propose the “oneirogen hypothesis”, which posits that the perceptual effects of classical psychedelics are a result of their pharmacological actions inducing neural activity states that truly are more similar to dream-like states. We simulate classical psychedelics’ effects via manipulating neural network models trained on perceptual tasks with the Wake-Sleep algorithm. This established machine learning algorithm leverages two activity phases, a perceptual phase (wake) where sensory inputs are encoded, and a generative phase (dream) where the network internally generates activity consistent with stimulus-evoked responses. We simulate the action of psychedelics by partially shifting the model to the ‘Sleep’ state, which entails a greater influence of top-down connections, in line with the impact of psychedelics on apical dendrites. The effects resulting from this manipulation capture a number of experimentally observed phenomena including the emergence of hallucinations, increases in stimulus-conditioned variability, and large increases in synaptic plasticity. We further provide a number of testable predictions which could be used to validate or invalidate our oneirogen hypothesis.

2024-09-29

bioRxiv (preprint)

doi.org

Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision

Melanie Gaillochet

Christian Desrosiers

Hervé Lombaert

2024-09-27

Lecture Notes in Computer Science (published)

doi.org

arxiv.org

Genetic Interplay Between White Matter Hyperintensities and Alzheimer's Disease: A Brain-Body Perspective

Manpreet Singh

Kimia Shafighi

Flavie E. Detcheverry

Fanta Dabo

Ikrame Housni

Sridar Narayanan

Sarah A. Gagliano Taliun

Danilo Bzdok

AmanPreet Badhwar

MRI-detected white matter hyperintensities (WMH) are often recognized as markers of cerebrovascular abnormalities and an index of vascular b… (see more)rain injury, and are frequently present in individuals with Alzheimer’s disease (AD). Given the emerging bidirectional communication between the brain-body axis in both WMHs and AD, it is important to understand their genetic underpinnings across the whole body. However, literature on this is scarce. We investigated the brain-body axis by breaking down heritability estimates of these phenotypes across the whole body, – i.e., partitioning heritability. Our aims were to identify genetic underpinnings specific to WMHs, and common between WMHs and AD, by assessing (a) the partitioned heritability of WMHs and AD across the brain-body axis with tissue-specific annotations, (b) the partitioned heritability of WMHs and AD across the brain-body axis with cell-specific annotations, and (c) the genes associated with WMHs and AD, and verifying their expression levels across the whole body. Our tissue-specific analysis revealed that WMH-associated SNPs were significantly enriched in tissues beyond the brain, namely liver, cardiovascular, and kidney – with liver being a common tissue enriched for both WMHs and AD. Our cell-specific analysis showed enrichment of vascular endothelial cells across the tissue types enriched for WMHs, highlighting their central role in the development of WMHs. Additionally, our gene-level analysis highlighted overlapping patterns of tissue enrichment for both WMHs and AD, and showed interactions between WMH and AD associated genes. Our findings provide new insights into the systemic influences potentially contributing to WMH pathology, in particular, multi-system endothelial disorder. We hope that our multisystemic genetic findings will stimulate future WMH-research into specific pathways across the brain-body axis.

2024-09-27

medRxiv (preprint)

doi.org

Refining SARS-CoV-2 intra-host variation by leveraging large-scale sequencing data

Fatima Mostefai

Jean-Christophe Grenier

Raphaël Poujol

Julie Hussin

Understanding viral genome evolution during host infection is crucial for grasping viral diversity and evolution. Analyzing intra-host singl… (see more)e nucleotide variants (iSNVs) offers insights into new lineage emergence, which is important for predicting and mitigating future viral threats. Despite next-generation sequencing’s potential, challenges persist, notably sequencing artifacts leading to false iSNVs. We developed a workflow to enhance iSNV detection in large NGS libraries, using over 130 000 SARS-CoV-2 libraries to distinguish mutations from errors. Our approach integrates bioinformatics protocols, stringent quality control, and dimensionality reduction to tackle batch effects and improve mutation detection reliability. Additionally, we pioneer the application of the PHATE visualization approach to genomic data and introduce a methodology that quantifies how related groups of data points are represented within a two-dimensional space, enhancing clustering structure explanation based on genetic similarities. This workflow advances accurate intra-host mutation detection, facilitating a deeper understanding of viral diversity and evolution.

2024-09-27

NAR Genomics and Bioinformatics (published)

doi.org

Longitudinal bi-criteria framework for assessing national healthcare responses to pandemic outbreaks

Adel Guitouni

Nabil Belacel

Loubna Benabbou

Belaid Moa

Munire Erman

Halim Abdul

2024-09-26

Scientific Reports (published)

doi.org

Replication of a GWAS signal near
<i>HLA-DQA2</i>
with acute myeloid leukemia using a disease-only cohort and external population-based controls

Rose Laflamme

Véronique Lisi

Josée Hébert

Guy Sauvageau

Sébastien Lemieux

Vincent-Philippe Lavallee

Guillaume Lettre

Acute myeloid leukemia (AML) is the most common type of acute leukemia in adults. Its risk factors include rare and highly penetrant somatic… (see more) mutations. Genome-wide association studies (GWAS) have also identified four common inherited variants associated with AML risk, but these findings have not yet been confirmed in many independent datasets. Here, we performed a replication study with 567 AML cases from the Leucegene cohort and 1,865 controls from the population-based cohort CARTaGENE (CaG). Because genotypes were generated using different technologies in the two datasets (e.g. low- vs. high-coverage whole-genome sequencing), we applied stringent quality-control filters to minimize type I errors. We showed using data reduction methods (e.g. principal component analysis [PCA] and uniform manifold approximation and projection [UMAP]) that our approach successfully integrated the Leucegene and CaG genetic data. We replicated the association between cytogenetically normal (CN)-AML and rs3916765, a variant located near HLA-DQA2 (odds ratio [95% confidence interval] = 1.88 [1.21-2.93], P- value=0.005). The effect size of this association was stronger when we restricted the analyses to AML patients with NPM1 mutations (odds ratios >2.35). We found HLA- DOB to be the most significantly upregulated gene in Leucegene participants with the CN-AML protective A-allele at rs3916765. We further found that several HLA class II genes are also differentially expressed albeit at lower statistical significance. Our results confirm that a common genetic variant at the HLA locus associates with AML risk, providing new opportunities to improve disease prognosis and treatment.

2024-09-26

medRxiv (preprint)

doi.org

CALE: Continuous Arcade Learning Environment

Jesse Farebrother

Pablo Samuel Castro

We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare … (see more)et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella), but adds support for continuous actions. This enables the benchmarking and evaluation of continuous-control agents (such as PPO [Schulman et al., 2017] and SAC [Haarnoja et al., 2018]) and value-based agents (such as DQN [Mnih et al., 2015] and Rainbow [Hessel et al., 2018]) on the same environment suite. We provide a series of open questions and research directions that CALE enables, as well as initial baseline results using Soft Actor-Critic. CALE is available as part of the ALE athttps://github.com/Farama-Foundation/Arcade-Learning-Environment.

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

doi.org

openreview.net

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Mila on Udemy

Publications

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Mila on Udemy

Popular keywords:

Publications