Publications

Clarifying a working definition for ‘precision communication’: a scoping review of medical literature on communication

Bao-Lam Pham

Brigitte N. Durieux

Amanda Bianco

Corinne Cécyre-Chartrand

Elena Guadagno

Amalia M. Issa

Dan Poenaru

2025-06-04

Personalized Medicine (published)

doi.org

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models

Revant Teotia

Candace Ross

Karen Ullrich

Sumit Chopra

Adriana Romero

Melissa Hall

Matthew J. Muckley

Recent advances in text-to-image (T2I) models have achieved impressive quality and consistency. However, this has come at the cost of repres… (see more)entation diversity. While automatic evaluation methods exist for benchmarking model diversity, they either require reference image datasets or lack specificity about the kind of diversity measured, limiting their adaptability and interpretability. To address this gap, we introduce the Does-it/Can-it framework, DIM-CIM, a reference-free measurement of default-mode diversity ("Does" the model generate images with expected attributes?) and generalization capacity ("Can" the model generate diverse attributes for a particular concept?). We construct the COCO-DIMCIM benchmark, which is seeded with COCO concepts and captions and augmented by a large language model. With COCO-DIMCIM, we find that widely-used models improve in generalization at the cost of default-mode diversity when scaling from 1.5B to 8.1B parameters. DIMCIM also identifies fine-grained failure cases, such as attributes that are generated with generic prompts but are rarely generated when explicitly requested. Finally, we use DIMCIM to evaluate the training data of a T2I model and observe a correlation of 0.85 between diversity in training images and default-mode diversity. Our work provides a flexible and interpretable framework for assessing T2I model diversity and generalization, enabling a more comprehensive understanding of model performance.

2025-06-04

ArXiv (preprint)

doi.org

arxiv.org

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

Johannes Von Oswald

Nino Scherrer

Seijin Kobayashi

Luca Versari

Songlin Yang

Maximilian Schlegel

Kaitlin Maile

Yanick Schimpf

Oliver Sieberling

Alexander Meulemans

Rif A. Saurous

Guillaume Lajoie

Charlotte Frenkel

Razvan Pascanu

Blaise Agüera y Arcas

João Sacramento

Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, trans… (see more)formers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM. These models can be unified by noting that their recurrent layer dynamics can all be derived from an in-context regression objective, approximately optimized through an online learning rule. Here, we join this line of work and introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer (von Oswald et al., 2024), and study it in language modeling at the billion-parameter scale. This layer again stems from an in-context loss, but which is now minimized to optimality at every time point using a fast conjugate gradient solver. Through an extensive suite of experiments, we show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs, especially on tasks requiring long context understanding. This performance gain comes at the cost of additional flops spent during inference time. Our results are therefore intriguingly related to recent trends of increasing test-time compute to improve performance -- here by spending compute to solve sequential optimization problems within the neural network itself.

2025-06-04

ArXiv (preprint)

doi.org

arxiv.org

Policy context and digital development: a comparative study of trajectories in 4 Canadian academic health centers over 30 years

Aude Motulsky

Susan Usher

Pascale Lehoux

Catherine Régis

Trish Reay

Paul Hebert

Lise Gauvin

Alain Biron

G. Ross Baker

Marie-Pierre Moreault

Johanne Préval

Jean-Louis Denis

The digitalization of health records stands to improve decision-making at clinical, administrative, and policy level. Efforts follow various… (see more) paths and are closely intertwined with health system and organizational configurations. Problems persist in both uptake and use. This study explores the digitalization trajectories of academic health centers (AHCs) to understand tensions between organizational and government strategies and their impact on digital development. AHCs play a leadership role within health systems in data-driven improvement. This retrospective case study draws on documentary, observational, and interview data to compare digitalization efforts over 3 decades in 4 AHCs in the province of Quebec (Canada). At system level, strategy shifted from supporting multilayered development that encouraged bottom-up initiatives in the first decade of the 2000s, to harmonizing clinical information systems in a highly prescriptive manner after 2010. AHCs experienced the shift differently according to concurrent impacts of health system restructuring, and internal choices around electronic health record (EHR) systems and implementation priorities. Digital maturity remained low in all 4 AHCs. Coordination between system strategies and organizational strategies in AHCs was neglected in early digital development in Québec and improved only after an intense period of prescription and resistance. Confrontation highlighted tensions around different objectives at AHC and system level, competing missions within AHCs, and trade-offs between relying on commercial EHRs and developing publicly owned systems, all of which ultimately influence EHR implementation. The different experiences of focal organizations with digitalization underline the importance of adapting national strategies and providing support to implementers, building on acquired strengths, and arriving at the right balance of guidance from the top and autonomy to develop innovative capacities.

2025-06-04

Journal of the American Medical Informatics Association : JAMIA (published)

doi.org

SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?

Senyu Li

Jiayi Wang

Felermino Dario Mario Ali

Colin Cherry

Daniel Deutsch

Eleftheria Briakou

Rui Sousa-Silva

Henrique Lopes Cardoso

Pontus Stenetorp

David Ifeoluwa Adelani

Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often… (see more) suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 13 African language pairs from the News domain, with over 63,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using state-of-the-art LLMs like GPT-4o and Claude. Our experimental results show that SSA-COMET models significantly outperform AfriCOMET and are competitive with the strongest LLM (Gemini 2.5 Pro) evaluated in our study, particularly on low-resource languages such as Twi, Luo, and Yoruba. All resources are released under open licenses to support future research.

2025-06-04

ArXiv (preprint)

doi.org

arxiv.org

Multi-timescale reinforcement learning in the brain

Paul Masset

Pablo Tano

HyungGoo R. Kim

Athar N. Malik

Alexandre Pouget

Naoshige Uchida

To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive… (see more) behavior can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2–6 and at characterizing the firing of dopamine neurons in the midbrain7–9. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations 10–14, and open new avenues for the design of more efficient reinforcement learning algorithms.

2025-06-03

Nature (published)

doi.org

Trophic Interactions Are Key to Understanding the Effects of Global Change on the Distribution and Functional Role of the Brown Bear

Pablo M. Lucas

Wilfried Thuiller

Lauren Talluto

Ester Polaina

Jörg Albrecht

Nuria Selva

Marta De Barba

Vincenzo Penteriani

Maya Guéguen

Niko Balkenhol

Trishna Dutta

Ancuta Fedorca

Shane C. Frank

Andreas Zedrosser

Ivan Afonso‐Jordana

Hüseyin Ambarlı

Fernando Ballesteros

Andriy‐Taras Bashta

Cemal Can Bilgin

Neda Bogdanović … (see 67 more)

Edgars Bojārs

Katarzyna Bojarska

Natalia Bragalanti

Henrik Brøseth

Mark W. Chynoweth

Duško Ćirović

Paolo Ciucci

Andrea Corradini

Daniele De Angelis

Miguel de Gabriel Hernando

Csaba Domokos

Aleksander Dutsov

Alper Ertürk

Stefano Filacorda

Lorenzo Frangini

Claudio Groff

Samuli Heikkinen

Bledi Hoxha

Djuro Huber

Otso Huitu

Georgeta Ionescu

Ovidiu Ionescu

Klemen Jerina

Ramon Jurj

Alexandros A. Karamanlidis

Jonas Kindberg

Ilpo Kojola

José Vicente López‐Bao

Peep Männil

Dime Melovski

Yorgos Mertzanis

Paolo Molinari

Anja Molinari‐Jobin

Andrea Mustoni

Javier Naves

Sergey Ogurtsov

Deniz Özüt

Santiago Palazón

Luca Pedrotti

Aleksandar Perović

Vladimir N. Piminov

Ioan‐Mihai Pop

Marius Popa

Maria Psaralexi

Pierre‐Yves Quenette

Georg Rauer

Slaven Reljic

Eloy Revilla

Urmas Saarma

Alexander P. Saveljev

Ali Onur Sayar

Çagan H. Şekercioğlu

Agnieszka Sergiel

George Sîrbu

Tomaž Skrbinšek

Michaela Skuban

Anil Soyumert

Aleksandar Stojanov

Egle Tammeleht

Konstantin Tirronen

Aleksandër Trajçe

Igor Trbojević

Tijana Trbojević

Filip Zięba

Diana Zlatanova

Tomasz Zwijacz‐Kozica

Laura J. Pollock

ABSTRACT Biotic interactions are expected to influence species' responses to global changes, but they are rarely considered across broad spa… (see more)tial extents. Abiotic factors are thought to operate at larger spatial scales, while biotic factors, such as species interactions, are considered more important at local scales within communities, in part because of the knowledge gap on species interactions at large spatial scales (i.e., the Eltonian shortfall). We assessed, at a continental scale, (i) the importance of biotic interactions, through food webs, on species distributions, and (ii) how biotic interactions under scenarios of climate and land‐use change may affect the distribution of the brown bear ( Ursus arctos ). We built a highly detailed, spatially dynamic, and empirically sampled food web based on the energy contribution of 276 brown bear food species from different taxa (plants, vertebrates, and invertebrates) and their ensemble habitat models at high resolution across Europe. Then, combining energy contribution and predicted habitat of food species, we modelled energy contribution across space and included these layers within Bayesian‐based models of the brown bear distribution in Europe. The inclusion of biotic interactions considerably improved our understanding of brown bear distribution at large (continental) scales compared with Bayesian models including only abiotic factors (climate and land use). Predicted future range shifts, which included changes in the distribution of food species, varied greatly when considering various scenarios of change in biotic factors, providing a warning that future indirect climate and land‐use change are likely to have strong but highly uncertain impacts on species biogeography. Our study confirmed that advancing our understanding of ecological networks of species interactions will improve future projections of biodiversity change, especially for modelling species distributions and their functional role under climate and land‐use change scenarios, which is key for effective conservation of biodiversity and ecosystem services.

2025-06-03

Global Change Biology (published)

doi.org

Galaxy cluster characterization with machine learning techniques

M. Sadikov

J. Hlavacek-Larrondo

L. Perreault-Levasseur

C. L. Rhea

M. McDonald

M. Ntampaka

J. Zuhone

We present an analysis of the X-ray properties of the galaxy cluster population in the z=0 snapshot of the IllustrisTNG simulations, utilizi… (see more)ng machine learning techniques to perform clustering and regression tasks. We examine five properties of the hot gas (the central cooling time, the central electron density, the central entropy excess, the concentration parameter, and the cuspiness) which are commonly used as classification metrics to identify cool core (CC), weak cool core (WCC) and non cool core (NCC) clusters of galaxies. Using mock Chandra X-ray images as inputs, we first explore an unsupervised clustering scheme to see how the resulting groups correlate with the CC/WCC/NCC classification based on the different criteria. We observe that the groups replicate almost exactly the separation of the galaxy cluster images when classifying them based on the concentration parameter. We then move on to a regression task, utilizing a ResNet model to predict the value of all five properties. The network is able to achieve a mean percentage error of 1.8% for the central cooling time, and a balanced accuracy of 0.83 on the concentration parameter, making them the best-performing metrics. Finally, we use simulation-based inference (SBI) to extract posterior distributions for the network predictions. Our neural network simultaneously predicts all five classification metrics using only mock Chandra X-ray images. This study demonstrates that machine learning is a viable approach for analyzing and classifying the large galaxy cluster datasets that will soon become available through current and upcoming X-ray surveys, such as eROSITA.

2025-06-02

The Astrophysical Journal (published)

doi.org

arxiv.org

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

Matthew Kowal

Jasper Timm

Jean-François Godbout

Thomas H Costello

Antonio A. Arechar

Gordon Pennycook

David G. Rand

Adam Gleave

Kellin Pelrine

Persuasion is a powerful capability of large language models (LLMs) that both enables beneficial applications (e.g. helping people quit smok… (see more)ing) and raises significant risks (e.g. large-scale, targeted political manipulation). Prior work has found models possess a significant and growing persuasive capability, measured by belief changes in simulated or real users. However, these benchmarks overlook a crucial risk factor: the propensity of a model to attempt to persuade in harmful contexts. Understanding whether a model will blindly ``follow orders'' to persuade on harmful topics (e.g. glorifying joining a terrorist group) is key to understanding the efficacy of safety guardrails. Moreover, understanding if and when a model will engage in persuasive behavior in pursuit of some goal is essential to understanding the risks from agentic AI systems. We propose the Attempt to Persuade Eval (APE) benchmark, that shifts the focus from persuasion success to persuasion attempts, operationalized as a model's willingness to generate content aimed at shaping beliefs or behavior. Our evaluation framework probes frontier LLMs using a multi-turn conversational setup between simulated persuader and persuadee agents. APE explores a diverse spectrum of topics including conspiracies, controversial issues, and non-controversially harmful content. We introduce an automated evaluator model to identify willingness to persuade and measure the frequency and context of persuasive attempts. We find that many open and closed-weight models are frequently willing to attempt persuasion on harmful topics and that jailbreaking can increase willingness to engage in such behavior. Our results highlight gaps in current safety guardrails and underscore the importance of evaluating willingness to persuade as a key dimension of LLM risk. APE is available at github.com/AlignmentResearch/AttemptPersuadeEval

2025-06-02

ArXiv (preprint)

doi.org

arxiv.org

ToothForge: Automatic Dental Shape Generation using Synchronized Spectral Embeddings

Tibor Kubík

Franccois Guibault

Michal vSpanvel

Hervé Lombaert

We introduce ToothForge, a spectral approach for automatically generating novel 3D teeth, effectively addressing the sparsity of dental shap… (see more)e datasets. By operating in the spectral domain, our method enables compact machine learning modeling, allowing the generation of high-resolution tooth meshes in milliseconds. However, generating shape spectra comes with the instability of the decomposed harmonics. To address this, we propose modeling the latent manifold on synchronized frequential embeddings. Spectra of all data samples are aligned to a common basis prior to the training procedure, effectively eliminating biases introduced by the decomposition instability. Furthermore, synchronized modeling removes the limiting factor imposed by previous methods, which require all shapes to share a common fixed connectivity. Using a private dataset of real dental crowns, we observe a greater reconstruction quality of the synthetized shapes, exceeding those of models trained on unaligned embeddings. We also explore additional applications of spectral analysis in digital dentistry, such as shape compression and interpolation. ToothForge facilitates a range of approaches at the intersection of spectral analysis and machine learning, with fewer restrictions on mesh structure. This makes it applicable for shape analysis not only in dentistry, but also in broader medical applications, where guaranteeing consistent connectivity across shapes from various clinics is unrealistic. The code is available at https://github.com/tiborkubik/toothForge.

2025-06-02

ArXiv (preprint)

doi.org

arxiv.org

Weak Supervision for Real World Graphs

Pratheeksha Nair

Reihaneh Rabbany

2025-06-02

ArXiv (preprint)

doi.org

arxiv.org

What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach

Xingfang Wu

Heng Li

Foutse Khomh

Log data are generated from logging statements in the source code, providing insights into the execution processes of software applications … (see more)and systems. State-of-the-art log-based anomaly detection approaches typically leverage deep learning models to capture the semantic or sequential information in the log data and detect anomalous runtime behaviors. However, the impacts of these different types of information are not clear. In addition, existing approaches have not captured the timestamps in the log data, which can potentially provide more fine-grained temporal information than sequential information. In this work, we propose a configurable transformer-based anomaly detection model that can capture the semantic, sequential, and temporal information in the log data and allows us to configure the different types of information as the model's features. Additionally, we train and evaluate the proposed model using log sequences of different lengths, thus overcoming the constraint of existing methods that rely on fixed-length or time-windowed log sequences as inputs. With the proposed model, we conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information in anomaly detection. When presented with log sequences of varying lengths, the model can attain competitive and consistently stable performance compared to the baselines. The results indicate that the event occurrence information plays a key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection in the studied public datasets. On the other hand, the findings also reveal the simplicity of the studied public datasets and highlight the importance of constructing new datasets that contain different types of anomalies to better evaluate the performance of anomaly detection models.

2025-06-02

Automated Software Engineering (published)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications