Amal Zouaq

Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMs

Matthew D Riemer

Pin-Yu Chen

Payel Das

2025-01-01

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (publié)

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Yash More

Matthew D Riemer

Pin-Yu Chen

Payel Das

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (voir plus)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

2024-11-11

ArXiv (prépublication)

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Yash More

Matthew D Riemer

Pin-Yu Chen

Payel Das

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (voir plus)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

2024-11-11

ArXiv (prépublication)

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Yash More

Matthew D Riemer

Pin-Yu Chen

Payel Das

Chandar Research Lab

Mila - Québec

U. Montŕeal

AI Institute

Ibm Research

Polytechnique Montréal

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (voir plus)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

2024-10-10

NeurIPS.cc/2024/Workshop/AFM (poster)

openreview.net

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Matthew D Riemer

Pin-Yu Chen

Payel Das

2024-06-07

ArXiv (prépublication)

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Matthew D Riemer

Pin-Yu Chen

Payel Das

Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-… (voir plus)training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.

2024-06-07

ArXiv (prépublication)

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Matthew D Riemer

Pin-Yu Chen

Payel Das

Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-… (voir plus)training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.

2024-06-07

ArXiv (prépublication)

MVP: Minimal Viable Phrase for Long Text Understanding.

Louis Clouatre