Publications

Active Attacks: Red-teaming LLMs via Adaptive Environments

Taeyoung YUN

Pierre-Luc St-Charles

Jinkyoo Park

Yoshua Bengio

Minsu Kim

2025-09-26

ArXiv (preprint)

Continual Pre-training of MoEs: How robust is your router?

Benjamin Therien

Charles-Etienne Joseph

Zain Sarwar

Ashwinee Panda

Anirban Das

Shi-Xiong Zhang

Stephen Rawls

Sambit Sahu

Eugene Belilovsky

Irina Rish

Sparsely-activated Mixture of Experts (MoE) transformers are promising architectures for foundation models. Compared to dense transformers t… (see more)hat require the same amount of floating-point operations (FLOPs) per forward pass, MoEs benefit from improved sample efficiency at training time and achieve much stronger performance. Many closed-source and open-source frontier language models have thus adopted an MoE architecture. Naturally, practitioners will want to extend the capabilities of these models with large amounts of newly collected data without completely re-training them. Prior work has shown that a simple combination of replay, learning rate re-warming, and re-decaying can enable the continual pre-training (CPT) of dense decoder-only transformers with minimal performance degradation compared to full re-training. In the case of decoder-only MoE transformers, however, it is unclear how the routing algorithm will impact continual pre-training performance: 1) *do the MoE transformer's routers exacerbate forgetting relative to a dense model?*; 2) *do the routers maintain a balanced load on previous distributions after CPT?*; 3) *are the same strategies applied to dense models sufficient to continually pre-train MoE LLMs?* In what follows, we conduct a large-scale study training a 500M parameter dense transformer and four 500M-active/2B-total parameter MoE transformers, following the Switch Transformer architecture and a granular DeepSeek-inspired architecture. Each model is trained for 600B tokens. Our results establish a surprising robustness to distribution shifts for MoEs using both Sinkhorn-Balanced and Z-and-Aux-loss-balanced routing algorithms, even in MoEs continually pre-trained without replay. Moreover, we show that MoE LLMs maintain their sample efficiency (relative to a FLOP-matched dense model) during CPT and that they can match the performance of a fully re-trained MoE at a fraction of the cost.

2025-09-26

TMLR (accepted)

openreview.net

Continual Pre-training of MoEs: How robust is your router?

Benjamin Therien

Charles-Etienne Joseph

Zain Sarwar

Ashwinee Panda

Anirban Das

Shi-Xiong Zhang

Stephen Rawls

Sambit Sahu

Eugene Belilovsky

Irina Rish

2025-09-26

TMLR (accepted)

openreview.net

Investigating Faithfulness in Large Audio Language Models

Lovenya Jain

Pooneh Mousavi

Mirco Ravanelli

Cem Subakan

Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliab… (see more)le explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.

2025-09-26

ArXiv (preprint)

Investigating Faithfulness in Large Audio Language Models

Lovenya Jain

Pooneh Mousavi

Mirco Ravanelli

Cem Subakan

Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliab… (see more)le explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.

2025-09-26

ArXiv (preprint)

TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses

Sahar Dastani

Ali Bahri

Gustavo Adolfo Vargas Hakim

Moslem Yazdanpanah

Mehrdad Noori

David Osowiechi

Samuel Barbeau

Ismail Ben Ayed

Hervé Lombaert

Christian Desrosiers

State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering arc… (see more)hitecture designed for vision tasks. However, their generalization performance degrades significantly under distribution shifts. To address this limitation, we propose TRUST (Test-Time Refinement using Uncertainty-Guided SSM Traverses), a novel test-time adaptation (TTA) method that leverages diverse traversal permutations to generate multiple causal perspectives of the input image. Model predictions serve as pseudo-labels to guide updates of the Mamba-specific parameters, and the adapted weights are averaged to integrate the learned information across traversal scans. Altogether, TRUST is the first approach that explicitly leverages the unique architectural properties of SSMs for adaptation. Experiments on seven benchmarks show that TRUST consistently improves robustness and outperforms existing TTA methods.

2025-09-26

ArXiv (preprint)

TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses

Sahar Dastani

Ali Bahri

Gustavo Adolfo Vargas Hakim

Moslem Yazdanpanah

Mehrdad Noori

David Osowiechi

Samuel Barbeau

Ismail Ben Ayed

Hervé Lombaert

Christian Desrosiers

State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering arc… (see more)hitecture designed for vision tasks. However, their generalization performance degrades significantly under distribution shifts. To address this limitation, we propose TRUST (Test-Time Refinement using Uncertainty-Guided SSM Traverses), a novel test-time adaptation (TTA) method that leverages diverse traversal permutations to generate multiple causal perspectives of the input image. Model predictions serve as pseudo-labels to guide updates of the Mamba-specific parameters, and the adapted weights are averaged to integrate the learned information across traversal scans. Altogether, TRUST is the first approach that explicitly leverages the unique architectural properties of SSMs for adaptation. Experiments on seven benchmarks show that TRUST consistently improves robustness and outperforms existing TTA methods.

2025-09-26

ArXiv (preprint)

Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study.

Peter Schellongowski

Michael Darmon

Philipp Eller

Laveena Munshi

Tobias Liebregts

Victoria Metaxa

Luca Montini

Tobias Lahmer

F. Taccone

Andry Van de Louw

Martin Balik

P. Pickkers

Pleun Hemelaar

Hemang Yadav

Andreas Barratt-Due

T. Karvunidis

Jordi Riera

G. Martucci

I. Martín-Loeches

Pedro Castro … (see 6 more)

Nina Buchtele

Virginie Lemiale

Stefan Hatzl

T. Staudinger

Elie Azoulay

2025-09-25

Intensive Care Medicine (published)

Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study.

Peter Schellongowski

Michael Darmon

Philipp Eller

Laveena Munshi

Tobias Liebregts

Victoria Metaxa

Luca Montini

Tobias Lahmer

F. Taccone

Andry Van de Louw

Martin Balik

P. Pickkers

Pleun Hemelaar

Hemang Yadav

Andreas Barratt-Due

T. Karvunidis

Jordi Riera

G. Martucci

I. Martín-Loeches

Pedro Castro … (see 6 more)

Nina Buchtele

Virginie Lemiale

Stefan Hatzl

T. Staudinger

Elie Azoulay

2025-09-25

Intensive Care Medicine (published)

Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study

Peter Schellongowski

Michael Darmon

Philipp Eller

Laveena Munshi

Tobias Liebregts

Victoria Metaxa

Luca Montini

Tobias Lahmer

F. Taccone

Andry Van de Louw

Martin Balik

P. Pickkers

Pleun Hemelaar

Hemang Yadav

Andreas Barratt-Due

T. Karvunidis

Jordi Riera

G. Martucci

I. Martín-Loeches

Pedro Castro … (see 6 more)

Nina Buchtele

Virginie Lemiale

Stefan Hatzl

T. Staudinger

Elie Azoulay

Purpose Acute respiratory failure is the leading reason for intensive care unit (ICU) admission among critically ill patients with cancer. W… (see more)e aimed to describe the clinical characteristics, risk factors, and outcomes of patients with cancer and acute respiratory distress syndrome (ARDS) and to evaluate associations of venovenous extracorporeal membrane oxygenation (ECMO) with outcomes in the subgroup with severe ARDS. Methods We conducted a multinational, prospective, observational cohort study of patients with cancer and ARDS in 13 countries in Europe and North America. The primary endpoint was 90-day mortality. Results Among 715 included patients, 73.4% had hematologic malignancies and 26.6% solid tumors; 31.2% had undergone hematopoietic stem-cell transplantation (168 allogeneic). ICU, hospital, and 90-day mortality rates were 55.3%, 70.9%, and 73.2%, respectively. By multivariate analysis, independent predictors of higher 90-day mortality were older age, peripheral vascular disease, severe ARDS at inclusion, acute kidney injury, and ICU admission as a time-limited trial (vs. full code). Conversely, lymphoma was associated with lower 90-day mortality. Among the 322 patients (45.7%) with severe ARDS at inclusion, 90-day mortality was 82.2%; with no difference between patients who received ECMO (n = 58, 18%) and those who did not (82.6% vs. 80.7%, P = 0.89). This finding remained unchanged in a double-adjusted overlap- and propensity-weighted Cox mixed-effects model (adjusted hazard ratio, 1.12; 95% confidence interval 0.65–1.94; P = 0.69). Conclusion Patients with cancer and ARDS, particularly severe forms, experience high 90-day mortality, irrespective of ECMO use. These findings suggest a need for nuanced ICU goals-of-care discussions and raise concerns about the generalizability of ECMO guidelines to this population. Supplementary Information The online version contains supplementary material available at 10.1007/s00134-025-08113-7.

2025-09-25

Intensive Care Medicine (published)

Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study

Peter Schellongowski

Michael Darmon

Philipp Eller

Laveena Munshi

Tobias Liebregts

Victoria Metaxa

Luca Montini

Tobias Lahmer

Fabio S. Taccone

Andry Van de Louw

Martin Balik

Peter Pickkers

Pleun Hemelaar

Hemang Yadav

Andreas Barratt-Due

Thomas Karvunidis

Jordi Riera

Gennaro Martucci

Ignacio Martin-Loeches

Pedro Castro … (see 6 more)

Nina Buchtele

Virginie Lemiale

Stefan Hatzl