Publications

Beyond Na\"ive Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs

Arjun Ashok

Andrew Robert Williams

Vincent Zhihao Zheng

Irina Rish

Nicolas Chapados

Étienne Marcotte

Valentina Zantedeschi

Alexandre Drouin

Forecasting in real-world settings requires models to integrate not only historical data but also relevant contextual information, often ava… (see more)ilable in textual form. While recent work has shown that large language models (LLMs) can be effective context-aided forecasters via na\"ive direct prompting, their full potential remains underexplored. We address this gap with 4 strategies, providing new insights into the zero-shot capabilities of LLMs in this setting. ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context independently from its forecast accuracy. CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines. IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models. Finally, RouteDP optimizes resource efficiency by using LLMs to estimate task difficulty, and routing the most challenging tasks to larger models. Evaluated on different kinds of context-aided forecasting tasks from the CiK benchmark, our strategies demonstrate distinct benefits over na\"ive prompting across LLMs of different sizes and families. These results open the door to further simple yet effective improvements in LLM-based context-aided forecasting.

2025-08-13

ArXiv (preprint)

arxiv.org

Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts

Maxime Heuillet

Yufei Cui

Boxing Chen

Audrey Durand

Prasanna Parthasarathi

2025-08-13

ArXiv (preprint)

arxiv.org

Pathfinding: a neurodynamical account of intuition

Steven Kotler

Michael Mannino

Karl Friston

Gyorgy Buzsáki

J. A. Scott Kelso

Guillaume Dumas

2025-08-13

Communications Biology (published)

doi.org

WeDesign: Generative AI-Facilitated Community Consultations for Urban Public Space Design

Rashid A. Mushkani

Hugo Berard

Shin (Alexandre) Koseki

2025-08-13

ArXiv (preprint)

arxiv.org

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Maxime Heuillet

Rishika Bhagwatkar

Jonas Ngnawe

Yann Batiste Pequignot

Ola Ahmad

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations … (see more)was pursued by training models from scratch (i.e., with random initializations) using specialized loss objectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize predictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model's ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with significant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation protocols, yielding 1,440 training configurations and 7,200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

2025-08-12

ArXiv (preprint)

arxiv.org

The Impact of a Pediatric Surgery Fundamentals Boot Camp on New Surgical Trainees' Perceived Knowledge and Confidence Levels.

Julia Ferreira

Simon Rahman

Fabio Botelho

Farhan Banji

W. A. Igrine

Gianluca Bertolizio

Sam Daniel

Thomas Engelhardt

Chantal Frigon

Lily H P Nguyen

Catherine Paquet

Dan Poenaru

Pramod Puligandla

Hussein Wissanji

Davinia Withington

Yasmine Yousef

Sherif Emil

2025-08-12

Journal of Pediatric Surgery (published)

doi.org

FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks

Moses Openja

Paolo Arcaini

Foutse Khomh

Fuyuki Ishikawa

Deep neural networks (DNNs) are being utilized in various aspects of our daily lives, including high-stakes decision-making applications tha… (see more)t impact individuals. However, these systems reflect and amplify bias from the data used during training and testing, potentially resulting in biased behavior and inaccurate decisions. For instance, having different misclassification rates between white and black sub-populations. However, effectively and efficiently identifying and correcting biased behavior in DNNs is a challenge. This paper introduces FairFLRep, an automated fairness-aware fault localization and repair technique that identifies and corrects potentially bias-inducing neurons in DNN classifiers. FairFLRep focuses on adjusting neuron weights associated with sensitive attributes, such as race or gender, that contribute to unfair decisions. By analyzing the input-output relationships within the network, FairFLRep corrects neurons responsible for disparities in predictive quality parity. We evaluate FairFLRep on four image classification datasets using two DNN classifiers, and four tabular datasets with a DNN model. The results show that FairFLRep consistently outperforms existing methods in improving fairness while preserving accuracy. An ablation study confirms the importance of considering fairness during both fault localization and repair stages. Our findings also show that FairFLRep is more efficient than the baseline approaches in repairing the network.

2025-08-11

ArXiv (preprint)

arxiv.org

FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks

Moses Openja

Paolo Arcaini

Foutse Khomh

Fuyuki Ishikawa

2025-08-11

ArXiv (preprint)

arxiv.org

Untold stories: A qualitative investigation of patient and family experiences with congenital diaphragmatic hernia.

Alexandra Dimmer

Zanib Nafees

Sabrina Beauseigle

Franco A Carnevale

Elena Guadagno

Dan Poenaru

Pramod Puligandla

2025-08-11

Journal of Pediatric Surgery (published)

doi.org

Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs

Behnoush Khavari

Jayesh Khullar

Franccois Rivest

Recent work has shown that LRNN models such as S4D, Mamba, and DeltaNet lack state-tracking capability due to either time-invariant transiti… (see more)on matrices or restricted eigenvalue ranges. To address this, input-dependent transition matrices, particularly those that are complex or non-triangular, have been proposed to enhance SSM performance on such tasks. While existing theorems demonstrate that both input-independent and non-negative SSMs are incapable of solving simple state-tracking tasks, such as parity, regardless of depth, they do not explore whether combining these two types in a multilayer SSM could help. We investigate this question for efficient SSMs with diagonal transition matrices and show that such combinations still fail to solve parity. This implies that a recurrence layer must both be input-dependent and include negative eigenvalues. Our experiments support this conclusion by analyzing an SSM model that combines S4D and Mamba layers.

2025-08-10

ArXiv (preprint)

arxiv.org

An Empirical Study on Method-Level Performance Evolution in Open-Source Java Projects

Kaveh Shahedi

Nana Gyambrah

Heng Li

Maxime Lamothe

Foutse Khomh

Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution re… (see more)mains poorly understood. While developers often make intuitive assumptions about which types of modifications are likely to cause performance regressions or improvements, these beliefs lack empirical validation at a fine-grained level. We conducted a large-scale empirical study analyzing performance evolution in 15 mature open-source Java projects hosted on GitHub. Our analysis encompassed 739 commits containing 1,499 method-level code changes, using Java Microbenchmark Harness (JMH) for precise performance measurement and rigorous statistical analysis to quantify both the significance and magnitude of performance variations. We employed bytecode instrumentation to capture method-specific execution metrics and systematically analyzed four key aspects: temporal performance patterns, code change type correlations, developer and complexity factors, and domain-size interactions. Our findings reveal that 32.7% of method-level changes result in measurable performance impacts, with regressions occurring 1.3 times more frequently than improvements. Contrary to conventional wisdom, we found no significant differences in performance impact distributions across code change categories, challenging risk-stratified development strategies. Algorithmic changes demonstrate the highest improvement potential but carry substantial regression risk. Senior developers produce more stable changes with fewer extreme variations, while code complexity correlates with increased regression likelihood. Domain-size interactions reveal significant patterns, with web server + small projects exhibiting the highest performance instability. Our study provides empirical evidence for integrating automated performance testing into continuous integration pipelines.

2025-08-09

ArXiv (preprint)

arxiv.org

An Empirical Study on Method-Level Performance Evolution in Open-Source Java Projects

Kaveh Shahedi

Nana Gyambrah

Heng Li

Maxime Lamothe

Foutse Khomh

Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution re… (see more)mains poorly understood. While developers often make intuitive assumptions about which types of modifications are likely to cause performance regressions or improvements, these beliefs lack empirical validation at a fine-grained level. We conducted a large-scale empirical study analyzing performance evolution in 15 mature open-source Java projects hosted on GitHub. Our analysis encompassed 739 commits containing 1,499 method-level code changes, using Java Microbenchmark Harness (JMH) for precise performance measurement and rigorous statistical analysis to quantify both the significance and magnitude of performance variations. We employed bytecode instrumentation to capture method-specific execution metrics and systematically analyzed four key aspects: temporal performance patterns, code change type correlations, developer and complexity factors, and domain-size interactions. Our findings reveal that 32.7% of method-level changes result in measurable performance impacts, with regressions occurring 1.3 times more frequently than improvements. Contrary to conventional wisdom, we found no significant differences in performance impact distributions across code change categories, challenging risk-stratified development strategies. Algorithmic changes demonstrate the highest improvement potential but carry substantial regression risk. Senior developers produce more stable changes with fewer extreme variations, while code complexity correlates with increased regression likelihood. Domain-size interactions reveal significant patterns, with web server + small projects exhibiting the highest performance instability. Our study provides empirical evidence for integrating automated performance testing into continuous integration pipelines.

2025-08-09

ArXiv (preprint)

arxiv.org

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

Hugo Larochelle appointed Scientific Director of Mila

Publications

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

Hugo Larochelle appointed Scientific Director of Mila

Popular keywords:

Publications