Marius Mosbach

marius.mosbach@mila.quebec

Postdoctorat - McGill University

Superviseur⋅e principal⋅e

Siva Reddy

Site web

Google Scholar

Publications

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader

Vaibhav Adlakha

Marius Mosbach

Dzmitry Bahdanau

Nicolas Chapados

Siva Reddy

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is… (voir plus) only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 3 popular LLMs ranging from 1.3B to 7B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data. Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.

2024-04-09

ArXiv (prépublication)

doi.org

arxiv.org

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader

Vaibhav Adlakha

Marius Mosbach

Dzmitry Bahdanau

Nicolas Chapados

Siva Reddy

2024-04-09

ArXiv (prépublication)

doi.org

arxiv.org

Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages

Jesujoba Oluwadara Alabi

David Ifeoluwa Adelani

Marius Mosbach

Dietrich Klakow

and XLM-R) and three NLP tasks (NER, news topic classiﬁcation, and sentiment classiﬁcation) shows that our approach is competitive to ap… (voir plus)plying LAFT on individual languages while requiring signiﬁcantly less disk space. Finally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efﬁcient ﬁne-tuning methods.

2022-01-01

arXiv.org (prépublication)

doi.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Marius Mosbach

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Marius Mosbach

Publications