Publications

Pregnancy AI: Development and Internal Validation of an Artificial Intelligence Tool to Predict Live Births in ICSI and IVF Cycles Using Clinical Features and Embryo Images
Penelope Borduas
Isaac-Jacques Kadoch
Simon Phillips
Daniel Dufort
Stabilizing Native Low-Rank LLM Pretraining
Foundation models have achieved remarkable success, yet their growing parameter counts pose significant computational and memory challenges.… (see more) Low-rank factorization offers a promising route to reduce training and inference costs, but the community lacks a stable recipe for training models from scratch using exclusively low-rank weights while matching the performance of the dense model. We demonstrate that Large Language Models (LLMs) can be trained from scratch using exclusively low-rank factorized weights for all non-embedding matrices without auxiliary"full-rank"guidance required by prior methods. While native low-rank training often suffers from instability and loss spikes, we identify uncontrolled growth in the spectral norm (largest singular value) of the weight matrix update as the dominant factor. To address this, we introduce Spectron: Spectral renormalization with orthogonalization, which dynamically bounds the resultant weight updates based on the current spectral norms of the factors. Our method enables stable, end-to-end factorized training with negligible overhead. Finally, we establish compute-optimal scaling laws for natively low-rank transformers, demonstrating predictable power-law behavior and improved inference efficiency relative to dense models.
Affordances Enable Partial World Modeling with LLMs
Gheorghe Comanici
Jonathan Richens
Jeremy Shar
Fei Xia
Laurent Orseau
Aleksandra Faust
Improving the Robustness of Large Language Models for Code Tasks via Fine-tuning with Perturbed Data
Yang Liu
Armstrong Foundjem
Xingfang Wu
Heng Li
Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as… (see more) code generation, completion, analysis, and bug fixing. Ensuring the robustness of these models against potential vulnerabilities from handling diverse inputs is critical, as variations in input can lead to incorrect or insecure code outputs. Objective: This work aims to improve the robustness of LLMs for coding-related tasks against potential adversarial inputs. Specifically, we investigate how fine-tuning LLMs with perturbed datasets impacts their robustness against input perturbations. Method: We systematically evaluated LLM robustness by fine-tuning models using datasets perturbed at character-level, word-level, and sentence-level, comparing results against base models and models fine-tuned on unperturbed datasets. Results: Fine-tuning LLMs with perturbed datasets significantly improves model robustness (RD usually drops around 4\% - 6\%), especially for models with relatively weak robustness. However, this fine-tuning process typically results in a slight performance decrease (pass@1 usually drops around 1\% - 3\%) compared to fine-tuning with unperturbed datasets, although occasional performance improvements are observed. Conclusion \&Implications: Fine-tuning LLMs for coding tasks with perturbed data effectively enhances their robustness at the cost of a minor performance reduction, emphasizing the importance of balancing the robustness and performance of LLMs for coding applications.
What Makes Value Learning Efficient in Residual Reinforcement Learning?
Guozheng Ma
Li Li
Haoyu Wang
Zixuan Liu
Dacheng Tao
Residual reinforcement learning (RL) enables stable online refinement of expressive pretrained policies by freezing the base and learning on… (see more)ly bounded corrections. However, value learning in residual RL poses unique challenges that remain poorly understood. In this work, we identify two key bottlenecks: cold start pathology, where the critic lacks knowledge of the value landscape around the base policy, and structural scale mismatch, where the residual contribution is dwarfed by the base action. Through systematic investigation, we uncover the mechanisms underlying these bottlenecks, revealing that simple yet principled solutions suffice: base-policy transitions serve as an essential value anchor for implicit warmup, and critic normalization effectively restores representation sensitivity for discerning value differences. Based on these insights, we propose DAWN (Data-Anchored Warmup and Normalization), a minimal approach targeting efficient value learning in residual RL. By addressing these bottlenecks, DAWN demonstrates substantial efficiency gains across diverse benchmarks, policy architectures, and observation modalities.
What do people want to fact-check?
Bijean Ghafouri
Luca Luceri
Emilio Ferrara
Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning
Nilaksh
Franccois Rivest
A. Chandar
AI Institute
Polytechnique Montr ´ eal
The Untapped Potential of Food Webs in Systematic Conservation Planning
Louise M. J. O'Connor
Wilfried Thuiller
Ulrich Brose
Éléonore Chenevois
Carla Freund
Benoit Gauzens
Pierre Gaüzere
Catherine Graham
Michael Harfoot
Myriam R. Hirt
Sébastien Lavergne
Luigi Maiorano
Atte Moilanen
Peter H. Verburg
Piero Visconti
International conservation policy includes the dual aims of protecting biodiversity and nature's contributions to people (NCP). Achieving th… (see more)ese goals requires protecting not only species and habitats but also the networks of biotic interactions that sustain them. Food webs, which represent predator‐prey interactions between species, are increasingly recognised as a link between ecosystem structure, function, and resilience, which are concepts that are frequently cited in conservation policy. Yet, conservation planning and policy typically focus on individual species and habitats and overlook the interactions that support their persistence. We review the literature at the intersection of food web ecology and conservation, and highlight how food webs can inform three conservation goals: preventing species extinctions, maintaining ecosystem functions and NCP, and fostering ecosystem resilience. Food web data and metrics, such as interaction diversity, trophic diversity, connectance, or modularity, can be used to prioritize species that are key to ecosystem structure and functioning, and to guide spatial prioritization to protect functionally diverse and resilient communities. Given the growing availability of food web data, incorporating food webs in conservation planning can lead to more effective and resilient conservation outcomes that sustain biodiversity and ecosystem functions in the long term.
Inverting Data Transformations via Diffusion Sampling
Jinwoo Kim
Sékou-Oumar Kaba
Jiyun Park
Seunghoon Hong
HypRAG: Hyperbolic Dense Retrieval for Retrieval Augmented Generation
Hiren Madhu
Ngoc Bui
Ali Maatouk
Leandros Tassiulas
Menglin Yang 0001
Sukanta Ganguly
Kiran Srinivasan
Rex Ying
Embedding geometry plays a fundamental role in retrieval quality, yet dense retrievers for retrieval-augmented generation (RAG) remain large… (see more)ly confined to Euclidean space. However, natural language exhibits hierarchical structure from broad topics to specific entities that Euclidean embeddings fail to preserve, causing semantically distant documents to appear spuriously similar and increasing hallucination risk. To address these limitations, we introduce hyperbolic dense retrieval, developing two model variants in the Lorentz model of hyperbolic space: HyTE-FH, a fully hyperbolic transformer, and HyTE-H, a hybrid architecture projecting pre-trained Euclidean embeddings into hyperbolic space. To prevent representational collapse during sequence aggregation, we introduce the Outward Einstein Midpoint, a geometry-aware pooling operator that provably preserves hierarchical structure. On MTEB, HyTE-FH outperforms equivalent Euclidean baselines, while on RAGBench, HyTE-H achieves up to 29% gains over Euclidean baselines in context relevance and answer relevance using substantially smaller models than current state-of-the-art retrievers. Our analysis also reveals that hyperbolic representations encode document specificity through norm-based separation, with over 20% radial increase from general to specific concepts, a property absent in Euclidean embeddings, underscoring the critical role of geometric inductive bias in faithful RAG systems.
BRIDGE: Predicting Human Task Completion Time From Model Performance
Mila - Québec
AI Institute
McGill University
Polytechnique Montréal
Periodic Labs
Servicenow Research
Canada Cifar
AI Chair
Evaluating the real-world capabilities of AI systems requires grounding benchmark performance in human-interpretable measures of task diffic… (see more)ulty. Existing approaches that rely on direct human task completion time annotations are costly, noisy, and difficult to scale across benchmarks. In this work, we propose BRIDGE, a unified psychometric framework that learns the latent difficulty scale from model responses and anchors it to human task completion time. Using a two-parameter logistic Item Response Theory model, we jointly estimate latent task difficulty and model capability from model performance data across multiple benchmarks. We demonstrate that latent task difficulty varies linearly with the logarithm of human completion time, allowing human task completion time to be inferred for new benchmarks from model performance alone. Leveraging this alignment, we forecast frontier model capabilities in terms of human task length and independently reproduce METR's exponential scaling results, with the 50% solvable task horizon doubling approximately every 6 months.
Constrained Group Relative Policy Optimization
Azal'ee Robitaille
Christopher Pal