Publications

SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Shamsuddeen Hassan Muhammad
Idris Abdulmumin
Seid Muhie Yimam
Ibrahim Ahmad
Nedjma OUSIDHOUM
Abinew Ayele
Saif Mohammad
Meriem Beloucif
Structure-aware protein self-supervised learning
Can Chen
Jingbo Zhou
Fan Wang
Dejing Dou
Adaptive patch foraging in deep reinforcement learning agents
Nathan Wispinski
Andrew Butcher
Craig S Chapman
Matthew Botvinick
Patrick M. Pilarski
Patch foraging is one of the most heavily studied behavioral optimization challenges in biology. However, despite its importance to biologic… (voir plus)al intelligence, this behavioral optimization problem is understudied in artificial intelligence research. Patch foraging is especially amenable to study given that it has a known optimal solution, which may be difficult to discover given current techniques in deep reinforcement learning. Here, we investigate deep reinforcement learning agents in an ecological patch foraging task. For the first time, we show that machine learning agents can learn to patch forage adaptively in patterns similar to biological foragers, and approach optimal patch foraging behavior when accounting for temporal discounting. Finally, we show emergent internal dynamics in these agents that resemble single-cell recordings from foraging non-human primates, which complements experimental and theoretical work on the neural mechanisms of biological foraging. This work suggests that agents interacting in complex environments with ecologically valid pressures arrive at common solutions, suggesting the emergence of foundational computations behind adaptive, intelligent behavior in both biological and artificial agents.
Autonomous optimization of neuroprosthetic stimulation parameters that drive the motor cortex and spinal cord outputs in rats and monkeys
Rose Guay Hottin
Sandrine L. Côté
Elena Massai
Léo Choinière
Uzay Macar
Samuel Laferrière
Parikshat Sirpal
Stephan Quessy
Marina Martinez
Numa Dancause
A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces
Charline Le Lan
Joshua Greaves
Jesse Farebrother
Mark Rowland
Fabian Pedregosa
Rishabh Agarwal
In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace i… (voir plus)s represented by a neural network, and hence can bescaled to datasets with an effectively infinite number of rows and columns. Our method consistsin defining a loss function whose minimizer is the desired principal subspace, and constructing agradient estimate of this loss whose bias can be controlled.
A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation
Florian Bordes
Samuel Lavoie
Randall Balestriero
Nicolas Ballas
Conservative objective models are a special kind of contrastive divergence-based energy model
Christopher Beckham
In this work we theoretically show that conservative objective models (COMs) for offline model-based optimisation (MBO) are a special kind o… (voir plus)f contrastive divergence-based energy model, one where the energy function represents both the unconditional probability of the input and the conditional probability of the reward variable. While the initial formulation only samples modes from its learned distribution, we propose a simple fix that replaces its gradient ascent sampler with a Langevin MCMC sampler. This gives rise to a special probabilistic model where the probability of sampling an input is proportional to its predicted reward. Lastly, we show that better samples can be obtained if the model is decoupled so that the unconditional and conditional probabilities are modelled separately.
Approach Intelligent Writing Assistants Usability with Seven Stages of Action
Avinash Bhat
Disha Shrivastava
MARSY: a multitask deep-learning framework for prediction of drug combination synergy scores
Mohamed Reda El Khili
Safyan Aman Memon
Motivation Combination therapies have emerged as a treatment strategy for cancers to reduce the probability of drug resistance and to improv… (voir plus)e outcome. Large databases curating the results of many drug screening studies on preclinical cancer cell lines have been developed, capturing the synergistic and antagonistic effects of combination of drugs in different cell lines. However, due to the high cost of drug screening experiments and the sheer size of possible drug combinations, these databases are quite sparse. This necessitates the development of transductive computational models to accurately impute these missing values. Results Here, we developed MARSY, a deep learning multi-task model that incorporates information on gene expression profile of cancer cell lines, as well as the differential expression signature induced by each drug to predict drug-pair synergy scores. By utilizing two encoders to capture the interplay between the drug-pairs, as well as the drug-pairs and cell lines, and by adding auxiliary tasks in the predictor, MARSY learns latent embeddings that improve the prediction performance compared to state-of-the-art and traditional machine learning models. Using MARSY, we then predicted the synergy scores of 133,722 new drug-pair cell line combinations, which we have made available to the community as part of this study. Moreover, we validated various insights obtained from these novel predictions using independent studies, confirming the ability of MARSY in making accurate novel predictions. Availability and Implementation An implementation of the algorithms in Python and cleaned input datasets are provided in https://github.com/Emad-COMBINE-lab/MARSY. Contact amin.emad@mcgill.ca Supplementary Information Online-only supplementary data is available at the journal’s website.
Source-free Domain Adaptation Requires Penalized Diversity
Laya Rafiee Sevyeri
Ivaxi Sheth
Farhood Farahnak
Alexandre See
Thomas Fevens
Mohammad Havaei
While neural networks are capable of achieving human-like performance in many tasks such as image classification, the impressive performance… (voir plus) of each model is limited to its own dataset. Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data, thus, increasing data privacy. Diversity in representation space can be vital to a model`s adaptability in varied and difficult domains. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. Motivated by the improved predictive performance of ensembles, we propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors with Distinct Backbone Architectures (DBA). Although diversity in feature space is increased, the unconstrained mutual information (MI) maximization may potentially introduce amplification of weak hypotheses. Thus we introduce the Weak Hypothesis Penalization (WHP) regularizer as a mitigation strategy. Our work proposes Penalized Diversity (PD) where the synergy of DBA and WHP is applied to unsupervised source-free domain adaptation for covariate shift. In addition, PD is augmented with a weighted MI maximization objective for label distribution shift. Empirical results on natural, synthetic, and medical domains demonstrate the effectiveness of PD under different distributional shifts.
Bugs in machine learning-based systems: a faultload benchmark
Mohammad Mehdi Morovati
Amin Nikanjam
Z. Jiang
Abstract 2987: BamQuery: a new proteogenomic tool to explore the immunopeptidome and prioritize actionable tumor antigens
Maria-Virginia Ruiz Cuevas
Marie-Pierre Hardy
Jean-David Larouche
Anca Apavaloaei
Eralda Kina
Krystel Vincent
Patrick Gendron
Jean-Philippe Laverdure
Chantal Durette
Pierre Thibault
Claude Perreault
Grégory Ehx
MHC class I-associated peptides (MAPs), collectively referred to as the immunopeptidome, have a pivotal role in cancer immunosurveillance. W… (voir plus)hile MAPs were long thought to be solely generated by the degradation of canonical proteins, recent advances in the field of proteogenomics (genomically-informed proteomics) evidenced that ∼10% of them originate from allegedly noncoding genomic sequences. Among these sequences, endogenous retroelements (EREs) are under intense scrutiny as a possible source of actionable tumor antigens (TAs). With the increasing number of cancer-oriented immunopeptidomic and proteogenomic studies comes the need to accurately attribute an RNA expression level to each MAP identified by mass-spectrometry. Here, we introduce BamQuery (BQ), a computational tool to attribute an exhaustive RNA expression to MAPs of any genomic origin (exon, intron, UTR, intergenic) from bulk and single-cell RNA-sequencing data. By using BQ on large datasets of published MAPs identified by mass spectrometry, we show that many of them can arise from more than one genomic region. Indeed, 27% of MAPs reported as deriving from protein-coding exons (canonical MAPs) could also arise from non-canonical genomic regions, sometimes with greater probability, and 61% of non-canonical MAPs could arise from more than a single genomic origin (334 possible regions on average per non-canonical MAP; up to 35,343 for EREs). The consideration of all these origins evidenced an unsuspected high RNA expression in normal human tissues of (i) published neoantigens/TAs (mutated or not); (ii) MAPs derived from proteasomal splicing, supposedly not genomically templated, and (iii) MAPs derived from viruses. In particular, the high expression of candidate immunotherapeutic targets such as TAs highlights the relevance of BamQuery and the necessity of using it to validate such antigens before translating their usage in clinical trials. We also demonstrate that BamQuery can be used to directly identify safe and actionable TAs as well as to predict their immunogenicity through our freely accessible web portal (https://bamquery.iric.ca/search). Therefore, BQ could become an essential tool in any TA prioritization pipeline in the near future. Citation Format: Maria-Virginia Ruiz Cuevas, Marie-Pierre Hardy, Jean-David Larouche, Anca Apavaloaei, Eralda Kina, Krystel Vincent, Patrick Gendron, Jean-Philippe Laverdure, Chantal Durette, Pierre Thibault, Sebastien Lemieux, Claude Perreault, Gregory Ehx. BamQuery: a new proteogenomic tool to explore the immunopeptidome and prioritize actionable tumor antigens [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2987.