Maxime Darrin

When is an Embedding Model More Promising than Another?

Ismail Ben Ayed

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews

Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to confere… (see more)nces has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews. Unlike traditional consensus-based methods, \sys extracts both common and unique opinions from the reviews. We introduce novel uniqueness scores based on the Rational Speech Act framework to identify relevant sentences in the reviews. Our method aims to provide a pragmatic glimpse into all reviews, offering a balanced perspective on their opinions. Our experimental results with both automatic metrics and human evaluation show that \sys generates more discriminative summaries than baseline methods in terms of human evaluation while achieving comparable performance with these methods in terms of automatic metrics.

2024-06-11

ArXiv (preprint)

doi.org

arxiv.org

When is an Embedding Model More Promising than Another?

Ismail Ben Ayed

Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to p… (see more)erform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.

2024-06-11

ArXiv (preprint)

doi.org

arxiv.org

COSMIC: Mutual Information for Task-Agnostic Summarization Evaluation

Maxime Darrin

Philippe Formont

Jackie Chi Kit Cheung

Pablo Piantanida

Assessing the quality of summarizers poses significant challenges. In response, we propose a novel task-oriented evaluation approach that as… (see more)sesses summarizers based on their capacity to produce summaries that are useful for downstream tasks, while preserving task outcomes. We theoretically establish a direct relationship between the resulting error probability of these tasks and the mutual information between source texts and generated summaries. We introduce

2024-02-29

ArXiv (preprint)

doi.org

arxiv.org

Rainproof: An Umbrella To Shield Text Generators From Out-Of-Distribution Data

Maxime Darrin

Pablo Piantanida

Pierre Colombo

Implementing effective control mechanisms to ensure the proper functioning and security of deployed NLP models, from translation to chatbots… (see more), is essential. A key ingredient to ensure safe system behaviour is Out-Of-Distribution (OOD) detection, which aims to detect whether an input sample is statistically far from the training distribution. Although OOD detection is a widely covered topic in classification tasks, most methods rely on hidden features output by the encoder. In this work, we focus on leveraging soft-probabilities in a black-box framework, i.e. we can access the soft-predictions but not the internal states of the model. Our contributions include: (i) RAINPROOF a Relative informAItioN Projection OOD detection framework; and (ii) a more operational evaluation setting for OOD detection. Surprisingly, we find that OOD detection is not necessarily aligned with task-specific measures. The OOD detector may filter out samples well processed by the model and keep samples that are not, leading to weaker performance. Our results show that RAINPROOF provides OOD detection methods more aligned with task-specific performance metrics than traditional OOD detectors.

2023-12-01

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

arxiv.org

RainProof: An Umbrella to Shield Text Generator from Out-Of-Distribution Data

Maxime Darrin

Pablo Piantanida

Pierre Colombo

Implementing effective control mechanisms to ensure the proper functioning and security of deployed NLP models, from translation to chatbots… (see more), is essential. A key ingredient to ensure safe system behaviour is Out-Of-Distribution (OOD) detection, which aims to detect whether an input sample is statistically far from the training distribution. Although OOD detection is a widely covered topic in classification tasks, most methods rely on hidden features output by the encoder. In this work, we focus on leveraging soft-probabilities in a black-box framework, i.e. we can access the soft-predictions but not the internal states of the model. Our contributions include: (i) RAINPROOF a Relative informAItioN Projection OOD detection framework; and (ii) a more operational evaluation setting for OOD detection. Surprisingly, we find that OOD detection is not necessarily aligned with task-specific measures. The OOD detector may filter out samples well processed by the model and keep samples that are not, leading to weaker performance. Our results show that RAINPROOF provides OOD detection methods more aligned with task-specific performance metrics than traditional OOD detectors.

2023-10-07

EMNLP/2023/Conference (accepted)

openreview.net