Publications

The In-Situ Effect of Offensive Ads on Search Engine Users

Elad Yom-Tov

Liat Levontin

Alexandra Olteanu

2025-02-25

ACM Transactions on Information Systems (publié)

On the Dichotomy Between Privacy and Traceability in ℓp Stochastic Convex Optimization

Sasha Voitovych

MAHDI HAGHIFAM

Idan Attias

Roi Livni

Daniel M. Roy

2025-02-24

ArXiv (prépublication)

On the Dichotomy Between Privacy and Traceability in ℓp Stochastic Convex Optimization

Sasha Voitovych

MAHDI HAGHIFAM

Idan Attias

Roi Livni

Daniel M. Roy

2025-02-24

ArXiv (prépublication)

On the Dichotomy Between Privacy and Traceability in $\ell_p$ Stochastic Convex Optimization

Sasha Voitovych

MAHDI HAGHIFAM

Idan Attias

Roi Livni

Daniel M. Roy

In this paper, we investigate the necessity of memorization in stochastic convex optimization (SCO) under …

2025-02-24

ArXiv (prépublication)

On the Dichotomy Between Privacy and Traceability in $\ell_p$ Stochastic Convex Optimization

Sasha Voitovych

MAHDI HAGHIFAM

Idan Attias

Roi Livni

Daniel M. Roy

In this paper, we investigate the necessity of memorization in stochastic convex optimization (SCO) under …

2025-02-24

ArXiv (prépublication)

On Traceability in $\ell_p$ Stochastic Convex Optimization

Sasha Voitovych

MAHDI HAGHIFAM

Idan Attias

Roi Livni

Daniel M. Roy

In this paper, we investigate the necessity of traceability for accurate learning in stochastic convex optimization (SCO) under …

2025-02-24

ArXiv (prépublication)

A generative approach to LLM harmfulness detection with special red flag tokens

Sophie Xhonneux

David Dobre

Mehrnaz Mofakhami

Leo Schwinn

Gauthier Gidel

2025-02-22

ArXiv (prépublication)

A generative approach to LLM harmfulness detection with special red flag tokens

Sophie Xhonneux

David Dobre

Mehrnaz Mofakhami

Leo Schwinn

Gauthier Gidel

Most safety training methods for large language models (LLMs) based on fine-tuning rely on dramatically changing the output distribution of … (voir plus)the model when faced with a harmful request, shifting it from an unsafe answer to a refusal to respond. These methods inherently compromise model capabilities and might make auto-regressive models vulnerable to attacks that make likely an initial token of affirmative response. To avoid that, we propose to expand the model's vocabulary with a special token we call red flag token () and propose to fine-tune the model to generate this token at any time harmful content is generated or about to be generated. This novel safety training method effectively augments LLMs into generative classifiers of harmfulness at all times during the conversation. This method offers several advantages: it enables the model to explicitly learn the concept of harmfulness while marginally affecting the generated distribution, thus maintaining the model's utility. It also evaluates each generated answer rather than just the input prompt and provides a stronger defence against sampling-based attacks. In addition, it simplifies the evaluation of the model's robustness and reduces correlated failures when combined with a classifier. We further show an increased robustness to long contexts, and supervised fine-tuning attacks.

2025-02-22

ArXiv (prépublication)

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Reyhane Askari Hemmat

Mohammad Pezeshki

Elvis Dohmatob

Florian Bordes

Pietro Astolfi

Melissa Hall

Jakob Verbeek

Michal Drozdzal

Adriana Romero Soriano

Inspired by the principle of deliberate practice in human learning, we propose Deliberate Practice for Synthetic Data Generation (DP), a nov… (voir plus)el framework that improves sample efficiency through dynamic synthetic data generation. Prior work has shown that scaling synthetic data is inherently challenging, as naively adding new data leads to diminishing returns. To address this, pruning has been identified as a key mechanism for improving scaling, enabling models to focus on the most informative synthetic samples. Rather than generating a large dataset and pruning it afterward, DP efficiently approximates the direct generation of informative samples. We theoretically show how training on challenging, informative examples improves scaling laws and empirically validate that DP achieves better scaling performance with significantly fewer training samples and iterations. On ImageNet-100, DP generates 3.4x fewer samples and requires six times fewer iterations, while on ImageNet-1k, it generates 8x fewer samples with a 30 percent reduction in iterations, all while achieving superior performance compared to prior work.

2025-02-21

ArXiv (prépublication)

PairBench: Are Vision-Language Models Reliable at Comparing What They See?

Aarash Feizi

Sai Rajeswar

Adriana Romero Soriano

Reihaneh Rabbany

Valentina Zantedeschi

Spandana Gella

Joao Monteiro

Understanding how effectively large vision language models (VLMs) compare visual inputs is crucial across numerous applications, yet this fu… (voir plus)ndamental capability remains insufficiently assessed. While VLMs are increasingly deployed for tasks requiring comparative judgment, including automated evaluation, re-ranking, and retrieval-augmented generation, no systematic framework exists to measure their performance in these scenarios. We present PairBench, a simple framework that evaluates VLMs as customizable similarity tools using widely available image datasets. Our approach introduces four key metrics for reliable comparison: alignment with human annotations, consistency across pair ordering, distribution smoothness, and controllability through prompting. Our analysis reveals that no model consistently excels across all metrics, with each demonstrating distinct strengths and weaknesses. Most concerning is the widespread inability of VLMs to maintain symmetric similarity scores. Interestingly, we demonstrate that performance on our benchmark strongly correlates with popular benchmarks used for more complex tasks, while providing additional metrics into controllability, smoothness and ordering. This makes PairBench a unique and comprehensive framework to evaluate the performance of VLMs for automatic evaluation depending on the task.

2025-02-21

ArXiv (prépublication)

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs

Aarash Feizi

Sai Rajeswar

Adriana Romero Soriano

Reihaneh Rabbany

Spandana Gella

Valentina Zantedeschi

Joao Monteiro

As large vision language models (VLMs) are increasingly used as automated evaluators, understanding their ability to effectively compare dat… (voir plus)a pairs as instructed in the prompt becomes essential. To address this, we present PairBench, a low-cost framework that systematically evaluates VLMs as customizable similarity tools across various modalities and scenarios. Through PairBench, we introduce four metrics that represent key desiderata of similarity scores: alignment with human annotations, consistency for data pairs irrespective of their order, smoothness of similarity distributions, and controllability through prompting. Our analysis demonstrates that no model, whether closed- or open-source, is superior on all metrics; the optimal choice depends on an auto evaluator's desired behavior (e.g., a smooth vs. a sharp judge), highlighting risks of widespread adoption of VLMs as evaluators without thorough assessment. For instance, the majority of VLMs struggle with maintaining symmetric similarity scores regardless of order. Additionally, our results show that the performance of VLMs on the metrics in PairBench closely correlates with popular benchmarks, showcasing its predictive power in ranking models.

2025-02-21

ArXiv (prépublication)