Publications

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Minseon Kim

Jin Myung Kwak

Lama Alssum

Bernard Ghanem

Philip Torr

David M. Krueger

Fazl Barez

Adel Bibi

Fine-tuning language models is commonly believed to inevitably harm their safety, i.e., refusing to respond to harmful user requests, even w… (see more)hen using harmless datasets, thus requiring additional safety measures. We challenge this belief through systematic testing, showing that poor optimization choices, rather than inherent trade-offs, often cause safety problems, measured as harmful responses to adversarial prompts. By properly selecting key training hyper-parameters, e.g., learning rate, batch size, and gradient steps, we reduce unsafe model responses from 16\% to approximately 5\%, as measured by keyword matching, while maintaining utility performance. Based on this observation, we propose a simple exponential moving average (EMA) momentum technique in parameter space that preserves safety performance by creating a stable optimization path and retains the original pre-trained model's safety properties. Our experiments on the Llama families across multiple datasets (Dolly, Alpaca, ORCA) demonstrate that safety problems during fine-tuning can largely be avoided without specialized interventions, outperforming existing approaches that require additional safety data while offering practical guidelines for maintaining both model performance and safety during adaptation.

2025-07-06

colmweb.org/COLM/2025/Conference (accepted)

doi.org

openreview.net

Statistical or Embodied? Comparing Colorseeing, Colorblind, Painters, and Large Language Models in Their Processing of Color Metaphors

Ethan O. Nadler

Douglas Guilbeault

Sofronia M. Ringold

T. R. Williamson

Antoine Bellemare‐Pepin

Iulia M. Comșa

Karim Jerbi CoCo Lab

Srini Narayanan

Lisa Aziz‐Zadeh

2025-07-06

Cognitive Science (published)

doi.org

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat

Ali Rahimi-Kalahroudi

Mohammad Pezeshki

A. Chandar

P Vincent

2025-07-06

colmweb.org/COLM/2025/Conference (accepted)

doi.org

openreview.net

Training Plug-and-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia

Alan Ansell

Edoardo Ponti

Ivan Vulić

Alessandro Sordoni

Dynamically integrating new or rapidly evolving information after (Large) Language Model pre-training remains challenging, particularly in l… (see more)ow-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly as the training objective for KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and retrieval-augmented generation.

2025-07-06

colmweb.org/COLM/2025/Conference (accepted)

openreview.net

RAT: Bridging RNN Efficiency and Attention Accuracy in Language Modeling

Xiuying Wei

Anunay Yadav

Razvan Pascanu

Caglar Gulçehre

Transformers have become the cornerstone of modern large-scale language models; however, their dependence on softmax attention poses a major… (see more) computational bottleneck, particularly in long-context settings. In this work, rather than following prevalent approaches such as linear attention (or SSMs) and local attention, we introduce an intermediate design called \rat between recurrence and attention mechanisms. It partitions the input into chunks, applies a simple linear recurrence within each chunk to capture local dependencies, and then performs softmax attention across chunks to model long-range interactions. By adjusting the size of the chunk, \rat enables flexible trade-offs, combining the strengths of RNN and attention. Empirically, with a chunk size of 16, the \rat layer achieves a \(7\times\) improvement in training speed with 100K token sequences and \(9\times\) in generation at 4K sequence length, while maintaining similar or sometimes even better accuracy compared to standard attention. We demonstrate this by training 1.3B parameter models from scratch and performing large-scale evaluations, including short- and long-context benchmarks, as well as supervised fine-tuning~(SFT). We further propose a hybrid architecture that interleaves \rat with local attention. By combining efficient long-range modeling with strong local interactions, this hybrid design not only improves inference speed and reduces cache memory usage compared to attention, but also consistently enhances performance, for example, achieving an average 1 point gain in commonsense reasoning tasks, up to 4 points on code tasks, and a 1 point Rouge-L increase in a summarization SFT task. Code is available at https://github.com/CLAIRE-Labo/RAT

2025-07-05

ArXiv (preprint)

doi.org

arxiv.org

DOLPHIN advances single-cell transcriptomics beyond gene level by leveraging exon and junction reads

Kailu Song

Yumin Zheng

Bowen Zhao

David H. Eidelman

Jian Tang

Jun Ding

The advent of single-cell sequencing has revolutionized the study of cellular dynamics, providing unprecedented resolution into the molecula… (see more)r states and heterogeneity of individual cells. However, the rich potential of exon-level information and junction reads within single cells remains underutilized. Conventional gene-count methods overlook critical exon and junction data, limiting the quality of cell representation and downstream analyses such as subpopulation identification and alternative splicing detection. We introduce DOLPHIN, a deep learning method that integrates exon-level and junction read data, representing genes as graph structures. These graphs are processed by a variational graph autoencoder to improve cell embeddings. DOLPHIN not only demonstrates superior performance in cell clustering, biomarker discovery, and alternative splicing detection but also provides a distinct capability to detect subtle transcriptomic differences at the exon level that are often masked in gene-level analyses. By examining cellular dynamics with enhanced resolution, DOLPHIN provides new insights into disease mechanisms and potential therapeutic targets.

2025-07-03

Nature Communications (published)

doi.org

Extracting and Following Paths for Robust Relational Reasoning with Large Language Models

Ge Zhang

Mohammad Alomrani

Hongjian Gu

Jiaming Zhou

Yaochen Hu

B. Wang

Qun Liu

Mark J. Coates

Yingxue Zhang

Jianye HAO

2025-07-03

KDD.org/2025/Workshop/SKnow-LLM (oral)

openreview.net

Decoding Humor-Induced Amusement via Facial Expression Analysis: Toward Emotion-Aware Applications

Gabrielle Toupin

Arthur Dehgan

Marie Buffo

Clément Feyt

Golnoush Alamian

Karim Jerbi

Anne-Lise Saive

Humor is widely recognized for its positive effects on well-being, including stress reduction, mood enhancement, and cognitive benefits. Yet… (see more), the lack of reliable tools to objectively quantify amusement—particularly its temporal dynamics—has limited progress in this area. Existing measures often rely on self-report or coarse summary ratings, providing little insight into how amusement unfolds over time. To address this gap, we developed a Random Forest model to predict the intensity of amusement evoked by humorous video clips, based on participants’ facial expressions—particularly the co-activation of Facial Action Units 6 and 12 (“% Smile”)—and video features such as motion, saliency, and topic. Our results show that exposure to humorous content significantly increases “% Smile”, with amusement peaking toward the end of videos. Importantly, we observed emotional carry-over effects, suggesting that consecutive humorous stimuli can sustain or amplify positive emotional responses. Even when trained solely on humorous content, the model reliably predicted amusement intensity, underscoring the robustness of our approach. Overall, this study provides a novel, objective method to track amusement on a fine temporal scale, advancing the measurement of nonverbal emotional expression. These findings may inform the design of emotion-aware applications and humor-based therapeutic interventions to promote well-being and emotional health.

2025-07-02

Applied Sciences (published)

doi.org

Personalizing brain stimulation: continual learning for sleep spindle detection

Milo Sobral

Hugo R Jourde

S Ehsan M Bajestani

Emily B J Coffey

Giovanni Beltrame

Objective. Personalized stimulation, in which algorithms used to detect neural events adapt to a user’s unique neural characteristics, may… (see more) be crucial to enable optimized and consistent stimulation quality for both fundamental research and clinical applications. Precise stimulation of sleep spindles-transient patterns of brain activity that occur during non rapid eye movement sleep that are involved in memory consolidation-presents an exciting frontier for studying memory functions; however, this endeavour is challenged by the spindles’ fleeting nature, inter-individual variability, and the necessity of real-time detection. Approach. We tackle these challenges using a novel continual learning framework. Using a pre-trained model capable of both online classification of sleep stages and spindle detection, we implement an algorithm that refines spindle detection, tailoring it to the individual throughout one or more nights without manual intervention. Main results. Our methodology achieves accurate, subject-specific targeting of sleep spindles and enables advanced closed-loop stimulation studies. While fine-tuning alone offers minimal benefits for single nights, our approach combining weight averaging demonstrates significant improvement over multiple nights, effectively mitigating catastrophic forgetting. Significance. This work represents an important step towards signal-level personalization of brain stimulation that can be applied to different brain stimulation paradigms including closed-loop brain stimulation, and to different neural events. Applications in fundamental neuroscience may enhance the investigative potential of brain stimulation to understand cognitive processes such as the role of sleep spindles in memory consolidation, and may lead to novel therapeutic applications.

2025-07-02

Journal of Neural Engineering (published)

doi.org

Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms.

Shadi Zabad

Chirayu Anant Haryan

Simon Gravel

Sanchit Misra

Yuemei Li

2025-07-02

American Journal of Human Genetics (published)

doi.org