Publications

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling
Mahdi Karami
Ali Behrouz
Peilin Zhong
Seyed Vahab Mirrokni
State-space models (SSMs) have recently attention as an efficient alternative to computationally expensive attention-based models for sequen… (see more)ce modeling. They rely on linear recurrences to integrate information over time, enabling fast inference, parallelizable training, and control over recurrence stability. However, traditional SSMs often suffer from limited effective memory, requiring larger state sizes for improved recall. Moreover, existing SSMs struggle to capture multi-scale dependencies, which are essential for modeling complex structures in time series, images, and natural language. This paper introduces a multi-scale SSM framework that addresses these limitations by representing sequence dynamics across multiple resolution and processing each resolution with specialized state-space dynamics. By capturing both fine-grained, high-frequency patterns and coarse, global trends, MS-SSM enhances memory efficiency and long-range modeling. We further introduce an input-dependent scale-mixer, enabling dynamic information fusion across resolutions. The proposed approach significantly improves sequence modeling, particularly in long-range and hierarchical tasks, while maintaining computational efficiency. Extensive experiments on benchmarks, including Long Range Arena, hierarchical reasoning, time series classification, and image recognition, demonstrate that MS-SSM consistently outperforms prior SSM-based models, highlighting the benefits of multi-resolution processing in state-space architectures.
Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems
Armstrong Foundjem
Lionel Nganyewou Tidjon
Leuson Da Silva
Foundation models for electrocardiogram interpretation: clinical implications
Achille Sowa
Jacques Delfrate
Olivier Tastet
Denis Corbin
Merve Kulbay
Derman Ozdemir
Marie-Jeanne Noël
François-Christophe Marois-Blanchet
François Harvey
Surbhi Sharma
Minhaj Ansari
I-Min Chiu
Valentina D'souza
Sam F. Friedman
Michael Chassé
Brian J. Potter
Jonathan Afilalo
Pierre Adil Elias
Gilbert Jabbour … (see 13 more)
Mourad Bahani
Marie-Pierre Dubé
Patrick M. Boyle
Neal A. Chatterjee
Joshua Barrios
Geoffrey H. Tison
David Ouyang
Mahnaz Maddah
Shaan Khurshid
Julia Cadrin-Tourigny
Rafik Tadros
Robert Avram
The 12-lead electrocardiogram (ECG) remains a cornerstone of cardiac diagnostics, yet existing artificial intelligence (AI) solutions for au… (see more)tomated interpretation often lack generalizability, remain closed source, and are primarily trained using supervised learning (SL), which requires extensive labelled datasets and may limit adaptability across diverse clinical settings. Self-supervised learning (SSL) can potentially overcome these limitations by learning robust representations from unlabelled data. To address these challenges, this study developed and compared two open-source foundational ECG models: DeepECG-SL, a supervised multilabel ECG model, and DeepECG-SSL, a self-supervised model. Both models were trained on over 1 million ECGs using a standardized preprocessing pipeline and automated free-text extraction from ECG reports to predict 77 cardiac conditions. DeepECG-SSL leveraged unlabelled data through self-supervised contrastive learning and masked lead modelling before fine-tuning for downstream tasks, while DeepECG-SL was trained directly on labelled diagnostic data in an end-to-end fashion. Performance was evaluated across seven private, multilingual healthcare systems and four public ECG repositories, with assessment of fairness by age and sex, and investigation of privacy vulnerabilities as well as memory and compute requirements. DeepECG-SSL achieved micro-averaged area under the receiver operating characteristic curves (AUROCs) across all 77 cardiac conditions for ECG interpretation of 0.990 [95% confidence interval (CI): 0.990, 0.990] on the internal dataset (MHI-ds), 0.981 (95% CI: 0.981, 0.981) on external public datasets (UKB, CLSA, MIMIC-IV and PTB), and 0.983 (95% CI: 0.983, 0.983) on external private datasets (UW, UCSF, JGH, NYP, MGH, CSH and CHUM), while DeepECG-SL demonstrated AUROCs of 0.992 (95% CI: 0.992, 0.992), 0.980 (95% CI: 0.980, 0.980), and 0.983 (95% CI: 0.983, 0.984), respectively. Fairness analyses revealed minimal disparities (true-positive rate and false-positive rate difference <0.1) across age and sex groups for both models. DeepECG-SSL demonstrated superior performance on limited-data digital biomarker tasks, with the largest improvements in long QT syndrome (LQTS) genotype classification (AUROC 0.931 vs 0.850, P = .026, n = 127 ECGs) and 5 year atrial fibrillation risk prediction (AUROC 0.742 vs 0.734, P < 0.001, n = 132 050 ECGs), while achieving superior performance in left ventricular ejection fraction ≤40% classification (AUROC 0.926 vs 0.917, P < 0.001, n = 25 252 ECGs) and comparable performance in LQTS detection (AUROC 0.767 vs 0.735, P = 0.117, n = 934 ECGs). This study establishes SSL as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics. By releasing model weights, preprocessing tools, and validation code, this work aims to support robust, data-efficient AI diagnostics across diverse clinical environments and questions.
Now is the time: operationalizing generative neurophenomenology through interpersonal methods
Anne Monnier
Lena Adel

Lived experience is shaped by intersubjective, social, cultural, and historical dimensions. For the past 30 years, neurophenomenology has… (see more) adopted an embodied perspective of the mind by integrating first-person experiential and third-person neurobehavioral perspectives. Indeed, the neurophenomenology pragmatic approach has embraced an embodied perspective of the mind by integrating experiential first-person and neurobehavioural third-person perspectives. Neurophenomenology reveals mutual constraints between both, as they co-constitute a person’s lived experience. This article emphasizes the intersubjective and social facets of lived experience as well as the readiness of the scientific community to use a "generative neurophenomenology" approach, envisioned in the 1990s by Francisco Varela. For this endeavour, we clarify three meanings of “generative” as it applies distinctly to generative phenomenology, generative passages, and generative models. Then, we propose to combine existing methods to update neurophenomenology program: First, by transitioning from individual to multiple people phenomenology methods that include intersubjectivity experience; second, by expanding traditional neuroscience to include measures of multimodal interpersonal synchrony; and third, by leveraging multiple computational tools to integrate different viewpoints, thereby enriching our understanding of lived experience; We also underscore the potential of diverse mathematical formalisms to capture aspects of human experience, all while underscoring that using computational approaches to model neurophenomenology does not entail endorsing computationalism as a grounding hypothesis of human experience. Finally, we illustrate the clinical relevance of this paradigm through two case studies in psychiatry—(1) with interactive dyads in autism and (2) with multiple members in family therapy sessions—demonstrating its translational potential.

Causally informed, multifactorial pathways linking cognition and personality to adolescent mental health
Jiadong Yan
Bin Wan
Paule Joanne Toussaint
Judy Chen
Gleb Bezgin
Yasser Iturria-Medina
Alan Evans
Sherif Karama
Adolescence is a sensitive period for the emergence of psychopathology. During this time, physiological changes and environmental exposures … (see more)jointly shape brain development and influence cognitive and personality maturation, collectively heightening vulnerability to mental disorders. However, the complexity of interactions between these factors has hindered a systems-level understanding of mental health and the causal roles of cognition and personality in psychopathology. In this study, we proposed a multifactorial causal framework integrating brain, pubertal, environmental, and behavioral factors to characterize heterogeneity in adolescent mental health trajectories at the individual level. We then investigated latent causal pathways linking cognition and personality to mental health outcomes and identified potential personalized intervention targets. Leveraging the Adolescent Brain Cognitive Development (ABCD) dataset ( N = 4,501), we analyzed 165 behavioral pairs connecting cognition and personality traits to mental health symptoms. Using cross-sectional multivariate mediation and longitudinal interaction-inclusive analyses, we identified 68 behavioral pairs showing significant causal relationships, with brain and environmental exposures contributing to most pathways, while pubertal factors exhibited limited involvement. Individualized interpretive analyses further revealed 23 pairs suggesting potential interventions with response rates exceeding 50%. Among these, behavioral inhibition, negative urgency, and processing speed emerged as the most common intervention targets, whereas psychosis symptoms and attention problems were the most likely issues to improve. Overall, our study advances a comprehensive framework capturing the multifactorial and heterogeneous nature of adolescent mental health, delineates specific causal pathways from cognitive and personality traits to psychopathology, and provides a principled basis for potential individualized intervention strategies.
A Comedy of Estimators: On KL Regularization in RL Training of LLMs
The reasoning performance of large language models (LLMs) can be substantially improved by training them with reinforcement learning (RL). T… (see more)he RL objective for LLM training involves a regularization term, which is the reverse Kullback-Leibler (KL) divergence between the trained policy and the reference policy. Since computing the KL divergence exactly is intractable, various estimators are used in practice to estimate it from on-policy samples. Despite its wide adoption, including in several open-source libraries, there is no systematic study analyzing the numerous ways of incorporating KL estimators in the objective and their effect on the downstream performance of RL-trained models. Recent works show that prevailing practices for incorporating KL regularization do not provide correct gradients for stated objectives, creating a discrepancy between the objective and its implementation. In this paper, we further analyze these practices and study the gradients of several estimators configurations, revealing how design choices shape gradient bias. We substantiate these findings with empirical observations by RL fine-tuning \texttt{Qwen2.5-7B}, \texttt{Llama-3.1-8B-Instruct} and \texttt{Qwen3-4B-Instruct-2507} with different configurations and evaluating their performance on both in- and out-of-distribution tasks. Through our analysis, we observe that, in on-policy settings: (1) estimator configurations with biased gradients can result in training instabilities; and (2) using estimator configurations resulting in unbiased gradients leads to better performance on in-domain as well as out-of-domain tasks. We also investigate the performance resulting from different KL configurations in off-policy settings and observe that KL regularization can help stabilize off-policy RL training resulting from asynchronous setups.
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Seijin Kobayashi
Yanick Schimpf
Maximilian Schlegel
Angelika Steger
Maciej Wolczyk
Johannes Von Oswald
Kaitlin Maile
Blake Aaron Richards
Rif A. Saurous
James Manyika
Blaise Agüera y Arcas
Alexander Meulemans
João Sacramento
Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecede… (see more)nted success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token can result in highly inefficient learning, particularly when rewards are sparse. Here, we show that it is possible to overcome this problem by acting and exploring within the internal representations of an autoregressive model. Specifically, to discover temporally-abstract actions, we introduce a higher-order, non-causal sequence model whose outputs control the residual stream activations of a base autoregressive model. On grid world and MuJoCo-based tasks with hierarchical structure, we find that the higher-order model learns to compress long activation sequence chunks onto internal controllers. Critically, each controller executes a sequence of behaviorally meaningful actions that unfold over long timescales and are accompanied with a learned termination condition, such that composing multiple controllers over time leads to efficient exploration on novel tasks. We show that direct internal controller reinforcement, a process we term "internal RL", enables learning from sparse rewards in cases where standard RL finetuning fails. Our results demonstrate the benefits of latent action generation and reinforcement in autoregressive models, suggesting internal RL as a promising avenue for realizing hierarchical RL within foundation models.
Energy-Efficient Multi-LLM Reasoning for Binary-Free Zero-Day Detection in IoT Firmware
Saeid Jamshidi
Omar Abdul-Wahab
Martine Bellaiche
Securing Internet of Things (IoT) firmware remains difficult due to proprietary binaries, stripped symbols, heterogeneous architectures, and… (see more) limited access to executable code. Existing analysis methods, such as static analysis, symbolic execution, and fuzzing, depend on binary visibility and functional emulation, making them unreliable when firmware is encrypted or inaccessible. To address this limitation, we propose a binary-free, architecture-agnostic solution that estimates the likelihood of conceptual zero-day vulnerabilities using only high-level descriptors. The approach integrates a tri-LLM reasoning architecture combining a LLaMA-based configuration interpreter, a DeepSeek-based structural abstraction analyzer, and a GPT-4o semantic fusion model. The solution also incorporates LLM computational signatures, including latency patterns, uncertainty markers, and reasoning depth indicators, as well as an energy-aware symbolic load model, to enhance interpretability and operational feasibility. In addition, we formally derive the mathematical foundations of the reasoning pipeline, establishing monotonicity, divergence, and energy-risk coupling properties that theoretically justify the model's behavior. Simulation-based evaluation reveals that high exposure conditions increase the predicted zero-day likelihood by 20 to 35 percent across models, with GPT-4o demonstrating the strongest cross-layer correlations and the highest sensitivity. Energy and divergence metrics significantly predict elevated risk (p < 0.01), reinforcing the effectiveness of the proposed reasoning framework.
Hidden sampling biases inflate performance in gene regulatory network inference
Florin Ratajczak
Eva Hoermanseder
Jason Hartford
Pascal Falter-Braun
Matthias Heinig
Antonio Scialdone
Accurate reconstruction of gene regulatory networks (GRNs) from single-cell transcriptomic data remains a major methodological challenge. Re… (see more)cent machine learning approaches, particularly graph neural networks and graph autoencoders, have reported improved performance, yet these gains do not consistently translate to realistic biological settings. Here, we show that a key reason for that is the way negative regulatory interactions are sampled for supervised training and evaluation. We find that widely used sampling strategies introduce node-degree biases that allow models to exploit trivial graph-structural cues rather than biological signals. Across multiple benchmarks, simple degree-based heuristics match or exceed state-of-the-art graph neural network models under these biased evaluation protocols. We further introduce a degree-aware sampling approach that eliminates these artifacts and provides more reliable assessments of GRN inference methods. Our results call for standardized, bias-aware benchmarking practices to ensure meaningful progress in supervised GRN inference from single-cell RNA-seq data.
Fine-Tuned In-Context Learners for Efficient Adaptation
Clare Lyle
Yazhe Li
Amal Rannen-Triki
When adapting large language models (LLMs) to a specific downstream task, two primary approaches are commonly employed: (1) prompt engineeri… (see more)ng, often with in-context few-shot learning, leveraging the model's inherent generalization abilities, and (2) fine-tuning on task-specific data, directly optimizing the model's parameters. While prompt-based methods excel in few-shot scenarios, their effectiveness often plateaus as more data becomes available. Conversely, fine-tuning scales well with data but may underperform when training examples are scarce. We investigate a unified approach that bridges these two paradigms by incorporating in-context learning directly into the fine-tuning process. Specifically, we fine-tune the model on task-specific data augmented with in-context examples, mimicking the structure of k-shot prompts. This approach, while requiring per-task fine-tuning, combines the sample efficiency of in-context learning with the performance gains of fine-tuning, leading to a method that consistently matches and often significantly exceeds both these baselines. To perform hyperparameter selection in the low-data regime, we propose to use prequential evaluation, which eliminates the need for expensive cross-validation and leverages all available data for training while simultaneously providing a robust validation signal. We conduct an extensive empirical study to determine which adaptation paradigm - fine-tuning, in-context learning, or our proposed unified approach offers the best predictive performance on a concrete data downstream-tasks.
Latent brain subtypes of chronotype reveal unique behavioral and health profiles across population cohorts
Julie Carrier
Kai-Florian Storch
Robin I. M. Dunbar
Chronotype is shaped by the complex interplay of endogenous and exogenous factors. This time-enduring trait ties into societal behaviors an… (see more)d is linked to psychiatric and metabolic conditions. Despite its multifaceted nature, prior research has treated chronotype as a monolithic trait across the population, risking overlooking substantial heterogeneity in neural and behavioral fingerprints. To uncover hidden subgroups, we develop a supervised pattern-learning framework integrating three complementary brain-imaging modalities with deep behavioral and health profiling from 27,030 UK Biobank participants. We identify five distinct, biologically valid chronotype subtypes. Each demonstrates unique patterns across brain, behavioral and health profiles. External validation in 10,550 US children from the ABCD Study cohort reveals reversed age distributions and replicates sex-associated brain-behavioral patterns, suggesting that potential divergences between chronotype traits observed throughout adulthood may begin to emerge early in life. These findings highlight underappreciated sources of population variation that echo the rhythm of people’s inner clock.
Latent brain subtypes of chronotype reveal unique behavioral and health profiles across population cohorts
Julie Carrier
Kai-Florian Storch
Robin I. M. Dunbar
Chronotype is shaped by the complex interplay of endogenous and exogenous factors. This time-enduring trait ties into societal behaviors an… (see more)d is linked to psychiatric and metabolic conditions. Despite its multifaceted nature, prior research has treated chronotype as a monolithic trait across the population, risking overlooking substantial heterogeneity in neural and behavioral fingerprints. To uncover hidden subgroups, we develop a supervised pattern-learning framework integrating three complementary brain-imaging modalities with deep behavioral and health profiling from 27,030 UK Biobank participants. We identify five distinct, biologically valid chronotype subtypes. Each demonstrates unique patterns across brain, behavioral and health profiles. External validation in 10,550 US children from the ABCD Study cohort reveals reversed age distributions and replicates sex-associated brain-behavioral patterns, suggesting that potential divergences between chronotype traits observed throughout adulthood may begin to emerge early in life. These findings highlight underappreciated sources of population variation that echo the rhythm of people’s inner clock.