Publications

Active Attacks: Red-teaming LLMs via Adaptive Environments
We address the challenge of generating diverse attack prompts for large language models (LLMs) that elicit harmful behaviors (e.g., insults,… (see more) sexual content) and are used for safety fine-tuning. Rather than relying on manual prompt engineering, attacker LLMs can be trained with reinforcement learning (RL) to automatically generate such prompts using only a toxicity classifier as a reward. However, capturing a wide range of harmful behaviors is a significant challenge that requires explicit diversity objectives. Existing diversity-seeking RL methods often collapse to limited modes: once high-reward prompts are found, exploration of new regions is discouraged. Inspired by the active learning paradigm that encourages adaptive exploration, we introduce \textit{Active Attacks}, a novel RL-based red-teaming algorithm that adapts its attacks as the victim evolves. By periodically safety fine-tuning the victim LLM with collected attack prompts, rewards in exploited regions diminish, which forces the attacker to seek unexplored vulnerabilities. This process naturally induces an easy-to-hard exploration curriculum, where the attacker progresses beyond easy modes toward increasingly difficult ones. As a result, Active Attacks uncovers a wide range of local attack modes step by step, and their combination achieves wide coverage of the multi-mode distribution. Active Attacks, a simple plug-and-play module that seamlessly integrates into existing RL objectives, unexpectedly outperformed prior RL-based methods -- including GFlowNets, PPO, and REINFORCE -- by improving cross-attack success rates against GFlowNets, the previous state-of-the-art, from 0.07% to 31.28% (a relative gain greater than
Active Attacks: Red-teaming LLMs via Adaptive Environments
Continual Pre-training of MoEs: How robust is your router?
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
Sambit Sahu
Sparsely-activated Mixture of Experts (MoE) transformers are promising architectures for foundation models. Compared to dense transformers t… (see more)hat require the same amount of floating-point operations (FLOPs) per forward pass, MoEs benefit from improved sample efficiency at training time and achieve much stronger performance. Many closed-source and open-source frontier language models have thus adopted an MoE architecture. Naturally, practitioners will want to extend the capabilities of these models with large amounts of newly collected data without completely re-training them. Prior work has shown that a simple combination of replay, learning rate re-warming, and re-decaying can enable the continual pre-training (CPT) of dense decoder-only transformers with minimal performance degradation compared to full re-training. In the case of decoder-only MoE transformers, however, it is unclear how the routing algorithm will impact continual pre-training performance: 1) *do the MoE transformer's routers exacerbate forgetting relative to a dense model?*; 2) *do the routers maintain a balanced load on previous distributions after CPT?*; 3) *are the same strategies applied to dense models sufficient to continually pre-train MoE LLMs?* In what follows, we conduct a large-scale study training a 500M parameter dense transformer and four 500M-active/2B-total parameter MoE transformers, following the Switch Transformer architecture and a granular DeepSeek-inspired architecture. Each model is trained for 600B tokens. Our results establish a surprising robustness to distribution shifts for MoEs using both Sinkhorn-Balanced and Z-and-Aux-loss-balanced routing algorithms, even in MoEs continually pre-trained without replay. Moreover, we show that MoE LLMs maintain their sample efficiency (relative to a FLOP-matched dense model) during CPT and that they can match the performance of a fully re-trained MoE at a fraction of the cost.
Continual Pre-training of MoEs: How robust is your router?
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
Sambit Sahu
Investigating Faithfulness in Large Audio Language Models
Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliab… (see more)le explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.
Investigating Faithfulness in Large Audio Language Models
Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliab… (see more)le explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
Sahar Dastani
Ali Bahri
Gustavo Adolfo Vargas Hakim
Mehrdad Noori
David Osowiechi
Samuel Barbeau
Ismail Ben Ayed
Christian Desrosiers
State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering arc… (see more)hitecture designed for vision tasks. However, their generalization performance degrades significantly under distribution shifts. To address this limitation, we propose TRUST (Test-Time Refinement using Uncertainty-Guided SSM Traverses), a novel test-time adaptation (TTA) method that leverages diverse traversal permutations to generate multiple causal perspectives of the input image. Model predictions serve as pseudo-labels to guide updates of the Mamba-specific parameters, and the adapted weights are averaged to integrate the learned information across traversal scans. Altogether, TRUST is the first approach that explicitly leverages the unique architectural properties of SSMs for adaptation. Experiments on seven benchmarks show that TRUST consistently improves robustness and outperforms existing TTA methods.
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
Sahar Dastani
Ali Bahri
Gustavo Adolfo Vargas Hakim
Mehrdad Noori
David Osowiechi
Samuel Barbeau
Ismail Ben Ayed
Christian Desrosiers
State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering arc… (see more)hitecture designed for vision tasks. However, their generalization performance degrades significantly under distribution shifts. To address this limitation, we propose TRUST (Test-Time Refinement using Uncertainty-Guided SSM Traverses), a novel test-time adaptation (TTA) method that leverages diverse traversal permutations to generate multiple causal perspectives of the input image. Model predictions serve as pseudo-labels to guide updates of the Mamba-specific parameters, and the adapted weights are averaged to integrate the learned information across traversal scans. Altogether, TRUST is the first approach that explicitly leverages the unique architectural properties of SSMs for adaptation. Experiments on seven benchmarks show that TRUST consistently improves robustness and outperforms existing TTA methods.
Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study.
Peter Schellongowski
Michael Darmon
Philipp Eller
Laveena Munshi
Tobias Liebregts
Victoria Metaxa
Luca Montini
Tobias Lahmer
F. Taccone
Andry Van de Louw
Martin Balik
P. Pickkers
Pleun Hemelaar
Hemang Yadav
Andreas Barratt-Due
T. Karvunidis
Jordi Riera
G. Martucci
I. Martín-Loeches
Pedro Castro … (see 6 more)
Nina Buchtele
Virginie Lemiale
Stefan Hatzl
T. Staudinger
Elie Azoulay
Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study.
Peter Schellongowski
Michael Darmon
Philipp Eller
Laveena Munshi
Tobias Liebregts
Victoria Metaxa
Luca Montini
Tobias Lahmer
F. Taccone
Andry Van de Louw
Martin Balik
P. Pickkers
Pleun Hemelaar
Hemang Yadav
Andreas Barratt-Due
T. Karvunidis
Jordi Riera
G. Martucci
I. Martín-Loeches
Pedro Castro … (see 6 more)
Nina Buchtele
Virginie Lemiale
Stefan Hatzl
T. Staudinger
Elie Azoulay
Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study
Peter Schellongowski
Michael Darmon
Philipp Eller
Laveena Munshi
Tobias Liebregts
Victoria Metaxa
Luca Montini
Tobias Lahmer
F. Taccone
Andry Van de Louw
Martin Balik
P. Pickkers
Pleun Hemelaar
Hemang Yadav
Andreas Barratt-Due
T. Karvunidis
Jordi Riera
G. Martucci
I. Martín-Loeches
Pedro Castro … (see 6 more)
Nina Buchtele
Virginie Lemiale
Stefan Hatzl
T. Staudinger
Elie Azoulay
Purpose Acute respiratory failure is the leading reason for intensive care unit (ICU) admission among critically ill patients with cancer. W… (see more)e aimed to describe the clinical characteristics, risk factors, and outcomes of patients with cancer and acute respiratory distress syndrome (ARDS) and to evaluate associations of venovenous extracorporeal membrane oxygenation (ECMO) with outcomes in the subgroup with severe ARDS. Methods We conducted a multinational, prospective, observational cohort study of patients with cancer and ARDS in 13 countries in Europe and North America. The primary endpoint was 90-day mortality. Results Among 715 included patients, 73.4% had hematologic malignancies and 26.6% solid tumors; 31.2% had undergone hematopoietic stem-cell transplantation (168 allogeneic). ICU, hospital, and 90-day mortality rates were 55.3%, 70.9%, and 73.2%, respectively. By multivariate analysis, independent predictors of higher 90-day mortality were older age, peripheral vascular disease, severe ARDS at inclusion, acute kidney injury, and ICU admission as a time-limited trial (vs. full code). Conversely, lymphoma was associated with lower 90-day mortality. Among the 322 patients (45.7%) with severe ARDS at inclusion, 90-day mortality was 82.2%; with no difference between patients who received ECMO (n = 58, 18%) and those who did not (82.6% vs. 80.7%, P = 0.89). This finding remained unchanged in a double-adjusted overlap- and propensity-weighted Cox mixed-effects model (adjusted hazard ratio, 1.12; 95% confidence interval 0.65–1.94; P = 0.69). Conclusion Patients with cancer and ARDS, particularly severe forms, experience high 90-day mortality, irrespective of ECMO use. These findings suggest a need for nuanced ICU goals-of-care discussions and raise concerns about the generalizability of ECMO guidelines to this population. Supplementary Information The online version contains supplementary material available at 10.1007/s00134-025-08113-7.
Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study
Peter Schellongowski
Michael Darmon
Philipp Eller
Laveena Munshi
Tobias Liebregts
Victoria Metaxa
Luca Montini
Tobias Lahmer
Fabio S. Taccone
Andry Van de Louw
Martin Balik
Peter Pickkers
Pleun Hemelaar
Hemang Yadav
Andreas Barratt-Due
Thomas Karvunidis
Jordi Riera
Gennaro Martucci
Ignacio Martin-Loeches
Pedro Castro … (see 6 more)
Nina Buchtele
Virginie Lemiale
Stefan Hatzl
Thomas Staudinger
Elie Azoulay