Portrait of Parishad BehnamGhader is unavailable

Parishad BehnamGhader

Master's Research - McGill University
Supervisor

Publications

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Parishad BehnamGhader
Vaibhav Adlakha
Marius Mosbach
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Parishad BehnamGhader
Vaibhav Adlakha
Marius Mosbach
Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is… (see more) only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 3 popular LLMs ranging from 1.3B to 7B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data. Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.
Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model
Parishad BehnamGhader
Santiago Miret
Augmenting pretrained language models with retrievers to select the supporting documents has shown promise in effectively solving common NLP… (see more) problems, including language modeling and question answering, in an interpretable way. In this paper, we first study the strengths and weaknesses of different retriever-augmented language models (REALM,
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
Vaibhav Adlakha
Parishad BehnamGhader
Xing Han Lu
Nicholas Meade
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as … (see more)question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided knowledge (faithfulness). Guided by human evaluation and analysis, we highlight the shortcomings of traditional metrics for both correctness and faithfulness. We then propose simple token-overlap based and model-based metrics that reflect the true performance of these models. Our analysis reveals that instruction-following models are competitive, and sometimes even outperform fine-tuned models for correctness. However, these models struggle to stick to the provided knowledge and often hallucinate in their responses. We hope our work encourages a more holistic evaluation of instruction-following models for QA. Our code and data is available at https://github.com/McGill-NLP/instruct-qa