Publications

Induced pluripotent stem cells display a distinct set of MHC I-associated peptides shared by human cancers
Anca Apavaloaei
Leslie Hesnard
Marie-Pierre Hardy
Basma Benabdallah
Grégory Ehx
Catherine Thériault
Jean-Philippe Laverdure
Chantal Durette
Joël Lanoix
Mathieu Courcelles
Nandita Noronha
Kapil Dev Chauhan
Christian Beauséjour
Mick Bhatia
Pierre Thibault
Claude Perreault
Information Gain Sampling for Active Learning in Medical Image Classification
Raghav Mehta
Changjian Shui
Brennan Nichyporuk
A portrait of the different configurations between digitally-enabled innovations and climate governance
Pierre J. C. Chuard
Jennifer Garard
Karsten A. Schulz
Nilushi Kumarasinghe
Damon Matthews
The generalizability of pre-processing techniques on the accuracy and fairness of data-driven building models: a case study
Ying Sun
Fariborz Haghighat
Single‐pass stratified importance resampling
Ege Ciklabakkal
Adrien Gruson
Iliyan Georgiev
Toshiya Hachisuka
Resampling is the process of selecting from a set of candidate samples to achieve a distribution (approximately) proportional to a desired t… (see more)arget. Recent work has revisited its application to Monte Carlo integration, yielding powerful and practical importance sampling methods. One drawback of existing resampling methods is that they cannot generate stratified samples. We propose two complementary techniques to achieve efficient stratified resampling. We first introduce bidirectional CDF sampling which yields the same result as conventional inverse CDF sampling but in a single pass over the candidates, without needing to store them, similarly to reservoir sampling. We then order the candidates along a space‐filling curve to ensure that stratified CDF sampling of candidate indices yields stratified samples in the integration domain. We showcase our method on various resampling‐based rendering problems.
BioCaster in 2021: automatic disease outbreaks detection from global news media
Zaiqiao Meng
Anya Okhmatovskaia
Maxime Polleri
Yannan Shen
Guido Powell
Zihao Fu
Iris Ganser
Meiru Zhang
Nicholas B King
Nigel Collier
A parsimonious description of global functional brain organization in three spatiotemporal patterns
Taylor Bolt
Jason S. Nomi
Jorge A. Salas
Catie Chang
B.T. Thomas Yeo
Lucina Q. Uddin
Shella Keilholz
Global fMRI signal topography differs systematically across the lifespan
Jason S. Nomi
Jingwei Li
Taylor Bolt
Catie Chang
Salome Kornfeld
Zachary T. Goodman
B.T. Thomas Yeo
R. Nathan Spreng
Lucina Q. Uddin
H4rm0ny: A Competitive Zero-Sum Two-Player Markov Game for Multi-Agent Learning on Evasive Malware Generation and Detection
Christopher Molloy
Steven H. H. Ding
Philippe Charland
To combat the increasingly versatile and mutable modern malware, Machine Learning (ML) is now a popular and effective complement to the exis… (see more)ting signature-based techniques for malware triage and identification. However, ML is also a readily available tool for adversaries. Recent studies have shown that malware can be modified by deep Reinforcement Learning (RL) techniques to bypass AI-based and signature-based anti-virus systems without altering their original malicious functionalities. These studies only focus on generating evasive samples and assume a static detection system as the enemy.Malware detection and evasion essentially form a two-party cat-and-mouse game. Simulating the real-life scenarios, in this paper we present the first two-player competitive game for evasive malware detection and generation, following the zero-sum Multi-Agent Reinforcement Learning (MARL) paradigm. Our experiments on recent malware show that the produced malware detection agent is more robust against adversarial attacks. Also, the produced malware modification agent is able to generate more evasive samples fooling both AI-based and other anti-malware techniques.
Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs
Stephen Bonner
Ufuk Kirik
Ola Engkvist
Ian P Barrett
Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utiliz… (see more)e the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modelling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modelling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.
Revisiting Hotels-50K and Hotel-ID
Aarash Feizi
Arantxa Casanova
In this paper, we propose revisited versions for two recent hotel recognition datasets: Hotels-50K and Hotel-ID. The revisited versions prov… (see more)ide evaluation setups with different levels of difficulty to better align with the intended real-world application, i.e. countering human trafficking. Real-world scenarios involve hotels and locations that are not captured in the current data sets, therefore it is important to consider evaluation settings where classes are truly unseen. We test this setup using multiple state-of-the-art image retrieval models and show that as expected, the models’ performances decrease as the evaluation gets closer to the real-world unseen settings. The rankings of the best performing models also change across the different evaluation settings, which further motivates using the proposed revisited datasets.
Characterizing User Behaviors in Open-Source Software User Forums: An Empirical Study
Jazlyn Hellman
Jiahao Chen
Md. Sami Uddin
Jinghui Cheng
User forums of Open Source Software (OSS) enable end-users to collaboratively discuss problems concerning the OSS applications. Despite deca… (see more)des of research on OSS, we know very little about how end-users engage with OSS communities on these forums, in particular, the challenges that hinder their continuous and meaningful participation in the OSS community. Many previous works are developer-centric and overlook the importance of end-user forums. As a result, end-users' expectations are seldom reflected in OSS development. To better understand user behaviors in OSS user forums, we carried out an empirical study analyzing about 1.3 million posts from user forums of four popular OSS applications: Zotero, Audacity, VLC, and RStudio. Through analyzing the contribution patterns of three common user types (end-users, developers, and organizers), we observed that end-users not only initiated most of the threads (above 96% of threads in three projects, 86% in the other), but also acted as the significant contributors for responding to other users' posts, even though they tended to lack confidence in their activities as indicated by psycho-linguistic analyses. Moreover, we found end-users more open, reflecting a more positive emotion in communication than organizers and developers in the forums. Our work contributes new knowledge about end-users' activities and behaviors in OSS user forums that the vital OSS stakeholders can leverage to improve end-user engagement in the OSS development process.