Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
TRUTH: Teaching LLMs to Rerank for Truth in Misinformation Detection
Final-answer-based metrics are commonly used for evaluating large language models (LLMs) on math word problems, often taken as proxies for r… (see more)easoning ability. However, such metrics conflate two distinct sub-skills: abstract formulation (capturing mathematical relationships using expressions) and arithmetic computation (executing the calculations). Through a disentangled evaluation on GSM8K and SVAMP, we find that the final-answer accuracy of Llama-3 and Qwen2.5 (1B-32B) without CoT is overwhelmingly bottlenecked by the arithmetic computation step and not by the abstract formulation step. Contrary to the common belief, we show that CoT primarily aids in computation, with limited impact on abstract formulation. Mechanistically, we show that these two skills are composed conjunctively even in a single forward pass without any reasoning steps via an abstract-then-compute mechanism: models first capture problem abstractions, then handle computation. Causal patching confirms these abstractions are present, transferable, composable, and precede computation. These behavioural and mechanistic findings highlight the need for disentangled evaluation to accurately assess LLM reasoning and to guide future improvements.
Software performance modeling plays a crucial role in developing and maintaining software systems. A performance model analytically describe… (see more)s the relationship between the performance of a system and its runtime activities. This process typically examines various aspects of a system's runtime behavior, such as the execution frequency of functions or methods, to forecast performance metrics like program execution time. By using performance models, developers can predict expected performance and thereby effectively identify and address unexpected performance regressions when actual performance deviates from the model's predictions. One common and precise method for capturing performance behavior is software tracing, which involves instrumenting the execution of a program, either at the kernel level (e.g., system calls) or application level (e.g., function calls). However, due to the nature of tracing, it can be highly resource-intensive, making it impractical for production environments where resources are limited. In this work, we propose statistical approaches to reduce tracing overhead by identifying and excluding performance-insensitive code regions, particularly application-level functions, from tracing while still building accurate performance models that can capture performance degradations. By selecting an optimal set of functions to be traced, we can construct optimized performance models that achieve an R-2 score of up to 99% and, sometimes, outperform full tracing models (models using non-optimized tracing data), while significantly reducing the tracing overhead by more than 80% in most cases. Our optimized performance models can also capture performance regressions in our studied programs effectively, demonstrating their usefulness in real-world scenarios. Our approach is fully automated, making it ready to be used in production environments with minimal human effort.
2025-07-21
ACM Transactions on Software Engineering and Methodology (published)
Corrigendum to "Child- and Proxy-reported Differences in Patient-reported Outcome and Experience Measures in Pediatric Surgery: Systematic Review and Meta-analysis" [Journal of Pediatric Surgery 60 (2025) 162172].
Corrigendum to "Virtual Reality for Pediatric Trauma Education - A Preliminary Face and Content Validation Study" [Journal of Pediatric Surgery 60 (2025) 161951].
The impact of statistical adjustment for assay performance on inferences from SARS-CoV-2 serological surveillance studies
Jiacheng Chen
Yuan Yu
Sheila F O’Brien
Carmen L Charlton
Steven J Drews
Jane M Heffernan
Amber M Smith
Yu Nakagama
Yasutoshi Kido
David L Buckeridge
W Alton Russell
Abstract Choice of immunoassay influences population seroprevalence estimates. Post hoc adjustments for assay performance could improve comp… (see more)arability of estimates across studies and enable pooled analyses. We assessed post hoc adjustment methods using data from 2021 to 2023 SARS-CoV-2 serosurveillance studies in Alberta, Canada: one that tested 124 008 blood donations using Roche immunoassays (SARS-CoV-2 nucleocapsid total antibody and anti–SARS-CoV-2 S) and another that tested 214 780 patient samples using Abbott immunoassays (SARS-CoV-2 IgG and anti–SARS-CoV-2 S). Comparing datasets, seropositivity for antibodies against nucleocapsid (anti-N) diverged after May 2022 due to differential loss of sensitivity as a function of time since infection. The commonly used Rogan-Gladen adjustment did not reduce this divergence. Regression-based adjustments using the assays’ semiquantitative results produced more similar estimates of anti-N seroprevalence and rolling incidence proportion (proportion of individuals infected in recent months). Seropositivity for antibodies targeting SARS-CoV-2 spike protein was similar without adjustment, and concordance was not improved when applying an alternative, functional threshold. These findings suggest that assay performance substantially impacted population inferences from SARS-CoV-2 serosurveillance studies in the Omicron period. Unlike methods that ignore time-varying assay sensitivity, regression-based methods using the semiquantitative assay resulted in increased concordance in estimated anti-N seropositivity and rolling incidence between cohorts using different assays.
Single-cell spatial transcriptomics such as in-situ hybridization or sequencing technologies can provide subcellular resolution that enables… (see more) the identification of individual cell identities, locations, and a deep understanding of subcellular mechanisms. However, accurate segmentation and annotation that allows individual cell boundaries to be determined remains a major challenge that limits all the above and downstream insights. Current machine learning methods heavily rely on nuclei or cell body staining, resulting in the significant loss of both transcriptome depth and the limited ability to learn latent representations of spatial colocalization relationships. Here, we propose Bering, a graph deep learning model that leverages transcript colocalization relationships for joint noise-aware cell segmentation and molecular annotation in 2D and 3D spatial transcriptomics data. Graph embeddings for the cell annotation are transferred as a component of multi-modal input for cell segmentation, which is employed to enrich gene relationships throughout the process. To evaluate performance, we benchmarked Bering with state-of-the-art methods and observed significant improvement in cell segmentation accuracies and numbers of detected transcripts across various spatial technologies and tissues. To streamline segmentation processes, we constructed expansive pre-trained models, which yield high segmentation accuracy in new data through transfer learning and self-distillation, demonstrating the generalizability of Bering.
As AI systems take on collaborative roles, they must reason about shared goals and beliefs-not just generate fluent language. The Rational S… (see more)peech Act (RSA) framework offers a principled approach to pragmatic reasoning, but existing extensions face challenges in scaling to multi-turn, collaborative scenarios. In this paper, we introduce Collaborative Rational Speech Act (CRSA), an information-theoretic (IT) extension of RSA that models multi-turn dialog by optimizing a gain function adapted from rate-distortion theory. This gain is an extension of the gain model that is maximized in the original RSA model but takes into account the scenario in which both agents in a conversation have private information and produce utterances conditioned on the dialog. We demonstrate the effectiveness of CRSA on referential games and template-based doctor-patient dialogs in the medical domain. Empirical results show that CRSA yields more consistent, interpretable, and collaborative behavior than existing baselines-paving the way for more pragmatic and socially aware language agents.
The immune system’s most basic task is to decide what is “self” and “non-self”, but a precise definition of self versus non-self r… (see more)emains challenging. According to the discontinuity theory of immunity, effector responses depend on how quickly an antigenic stimulus changes: rapid change triggers an immune response, whereas gradual change fosters tolerance. We present a model of adaptive immune dynamics including T cells, Tregs and cytokines that reproduces the hallmarks of the discontinuity theory. The model allows for sharp discrimination between acute and chronic infections based on the growth rate of the immune challenge, and vaccination-like acute dynamics upon presentation of a bolus of immune challenge. We further show that the model behavior only depends on a handful of testable assumptions that we map to geometric constraints in phase space. This suggests that the model properties are generic and robust across alternative mechanistic details. We also examine the impact of multiple concurrent immune challenges in this model, and demonstrate the occurrence of dynamical antagonism, wherein, in some parameter regimes, slow-growing challenges hinder acute responses to fast-growing ones, with further counter-intuitive behaviors for sequential co-infections. Together, these results place the discontinuity theory on firm mathematical footing and encourage further investigation of interferences of multi-agent immune challenges, from chronic viral co-infections to cancer immunoediting.