Quentin Fournier

Structure-Aligned Protein Language Model

Can Chen

David Heurtel-Depeiges

Robert M. Vernon

Christopher J. Langmead

Yoshua Bengio

Quentin Fournier

2025-05-01

arXiv (published)

NeoBERT: A Next-Generation BERT

Lola Le Breton

Quentin Fournier

Mariam El Mezouar

Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of … (see more)large auto-regressive language models such as LLaMA and DeepSeek. In contrast, encoders like BERT and RoBERTa have not seen the same level of progress despite being foundational for many downstream NLP applications. To bridge this gap, we introduce NeoBERT, a next-generation encoder that redefines the capabilities of bidirectional models by integrating state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. NeoBERT is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions. In addition, we rigorously evaluate the impact of each modification on GLUE and design a uniform fine-tuning and evaluation framework for MTEB. We release all code, data, checkpoints, and training scripts to accelerate research and real-world adoption.

2025-02-26

ArXiv (preprint)

NeoBERT: A Next-Generation BERT

Lola Le Breton

Quentin Fournier

John Xavier Morris

Mariam El Mezouar

Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of … (see more)large auto-regressive language models such as LLaMA and DeepSeek. In contrast, encoders like BERT and RoBERTa have not seen the same level of progress despite being foundational for many downstream NLP applications. To bridge this gap, we introduce NeoBERT, a next-generation encoder that redefines the capabilities of bidirectional models by integrating state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. NeoBERT is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions. In addition, we rigorously evaluate the impact of each modification on GLUE and design a uniform fine-tuning and evaluation framework for MTEB. We release all code, data, checkpoints, and training scripts to accelerate research and real-world adoption.

2025-01-01

Trans. Mach. Learn. Res. (published)

openreview.net

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Megh Thakkar

Yash More

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Payel Das

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (see more)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

2024-11-11

ArXiv (preprint)

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Megh Thakkar

Yash More

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Payel Das

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (see more)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

2024-11-11

ArXiv (preprint)

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Megh Thakkar

Yash More

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Payel Das

Chandar Research Lab

Mila - Québec

AI Institute

Ibm Research

Polytechnique Montréal

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (see more)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

2024-10-10

NeurIPS.cc/2024/Workshop/AFM (poster)

openreview.net

Protein Language Models: Is Scaling Necessary?

Quentin Fournier

Robert M. Vernon

Almer van der Sloot

Benjamin Schulz

Christopher James Langmead

2024-09-23

bioRxiv (preprint)

Protein Language Models: Is Scaling Necessary?

Quentin Fournier

Robert M. Vernon

Almer van der Sloot

Benjamin Schulz

Christopher James Langmead

Public protein sequence databases contain samples from the fitness landscape explored by nature. Protein language models (pLMs) pre-trained … (see more)on these sequences aim to capture this landscape for tasks like property prediction and protein design. Following the same trend as in natural language processing, pLMs have continuously been scaled up. However, the premise that scale leads to better performance assumes that source databases provide accurate representation of the underlying fitness landscape, which is likely false. By developing an efficient codebase, designing a modern architecture, and addressing data quality concerns such as sample bias, we introduce AMPLIFY, a best-in-class pLM that is orders of magnitude less expensive to train and deploy than previous models. Furthermore, to support the scientific community and democratize the training of pLMs, we have open-sourced AMPLIFY’s pre-training codebase, data, and model checkpoints.

2024-09-23

bioRxiv (preprint)

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Megh Thakkar

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Payel Das

Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-… (see more)training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.

2024-06-07

ArXiv (preprint)

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Megh Thakkar

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Payel Das

Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-… (see more)training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.

2024-06-07

ArXiv (preprint)

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Megh Thakkar

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Payel Das

2024-06-07

ArXiv (preprint)