Publications

Pioneering women in nuclear and radiation sciences
Mirta Dumancic
A responsible framework for applying artificial intelligence on medical images and signals at the point-of-care: the PACS-AI platform.
Pascal Thériault-Lauzier
Denis Cobin
Olivier Tastet
Élodie Labrecque Langlais
B. Taji
Guson Kang
A. Chong
Derek So
An Tang
J. W. Gichoya
Pierre-Luc Deziel
Julie G. Hussin
Samuel Kadoury
Robert Avram
Revisiting the 2023 wildfire season in Canada
Flavie Pelletier
Michael A. Wulder
Joanne C. White
Txomin Hermosilla
Amortizing intractable inference in diffusion models for vision, language, and control
Siddarth Venkatraman
Moksh J. Jain
Luca Scimeca
Minsu Kim
Marcin Sendera
Mohsin Hasan
Luke Rowe
Sarthak Mittal
Pablo Lemos
Emmanuel Bengio
Alexandre Adam
Jarrid Rector-Brooks
Nikolay Malkin
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Benjamin Thérien
Charles-Étienne Joseph
Boris Knyazev
Edouard Oyallon
How well do models of visual cortex generalize to out of distribution samples?
Yifei Ren
On shallow planning under partial observability
Randy Lefebvre
On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics -- Empirical Study on Brown Build and Risk Prediction
Doriane Olewicki
Sarra Habchi
Mathieu Nayrolles
Mojtaba Faramarzi
Bram Adams
Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well establishe… (voir plus)d. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time. Current retraining practices typically require retraining a new model from scratch on a large updated dataset when performance decay is observed, thus incurring a computational cost; also there is no continuity between the models as the past model is discarded and ignored during the new model training. Even though the literature has taken interest in online learning approaches, those have rarely been integrated and evaluated in industrial environments. This paper evaluates the use of lifelong learning (LL) for industrial use cases at Ubisoft, evaluating both the performance and the required computational effort in comparison to the retraining-from-scratch approaches commonly used by the industry. LL is used to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data. To avoid so-called"catastrophic forgetting"of important older data points, we adopt a replay buffer of older data, which still allows us to drastically reduce the size of the overall training dataset, and hence model training time.
Towards a General GNN Framework for Combinatorial Optimization
Frederik Wenkel
Semih Cantürk
Michael Perlmutter
GraphAny: A Foundation Model for Node Classification on Any Graph
Jianan Zhao
Hesham Mostafa
Mikhail Galkin
Michael Bronstein
Zhaocheng Zhu
Does learning the right latent variables necessarily improve in-context learning?
Sarthak Mittal
Eric Elmoznino
L'eo Gagnon
Sangnie Bhardwaj
Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (voir plus)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.
Forward-Backward Knowledge Distillation for Continual Clustering
Mohammadreza Sadeghi
Zihan Wang
Unsupervised Continual Learning (UCL) is a burgeoning field in machine learning, focusing on enabling neural networks to sequentially learn … (voir plus)tasks without explicit label information. Catastrophic Forgetting (CF), where models forget previously learned tasks upon learning new ones, poses a significant challenge in continual learning, especially in UCL, where labeled information of data is not accessible. CF mitigation strategies, such as knowledge distillation and replay buffers, often face memory inefficiency and privacy issues. Although current research in UCL has endeavored to refine data representations and address CF in streaming data contexts, there is a noticeable lack of algorithms specifically designed for unsupervised clustering. To fill this gap, in this paper, we introduce the concept of Unsupervised Continual Clustering (UCC). We propose Forward-Backward Knowledge Distillation for unsupervised Continual Clustering (FBCC) to counteract CF within the context of UCC. FBCC employs a single continual learner (the ``teacher'') with a cluster projector, along with multiple student models, to address the CF issue. The proposed method consists of two phases: Forward Knowledge Distillation, where the teacher learns new clusters while retaining knowledge from previous tasks with guidance from specialized student models, and Backward Knowledge Distillation, where a student model mimics the teacher's behavior to retain task-specific knowledge, aiding the teacher in subsequent tasks. FBCC marks a pioneering approach to UCC, demonstrating enhanced performance and memory efficiency in clustering across various tasks, outperforming the application of clustering algorithms to the latent space of state-of-the-art UCL algorithms.