Pablo Piantanida

2025-04-01

Signal Processing (publié)

Multiple-model coding scheme for electrical signal compression

Corentin Presvôts

Michel Kieffer

Thibault Prevost

Patrick Panciatici

Zuxing Li

2025-04-01

Signal Processing (publié)

A Strong Baseline for Molecular Few-Shot Learning

Philippe Formont

Hugo Jeannin

Ismail Ben Ayed

Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving con… (voir plus)voluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.

2025-02-15

TMLR (accepté)

openreview.net

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

Eric Aubinais

Philippe Formont

Elisabeth Gassiat

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance l… (voir plus)evels comparable to the original models. In this work, we investigate the impact of quantization procedures on the privacy of data-driven models, specifically focusing on their vulnerability to membership inference attacks. We derive an asymptotic theoretical analysis of Membership Inference Security (MIS), characterizing the privacy implications of quantized algorithm weights against the most powerful (and possibly unknown) attacks. Building on these theoretical insights, we propose a novel methodology to empirically assess and rank the privacy levels of various quantization procedures. Using synthetic datasets, we demonstrate the effectiveness of our approach in assessing the MIS of different quantizers. Furthermore, we explore the trade-off between privacy and performance using real-world data and models in the context of molecular modeling.

2025-02-10

ArXiv (prépublication)

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

Eric Aubinais

Philippe Formont

Elisabeth Gassiat

2025-02-10

ArXiv (prépublication)

When is an Embedding Model More Promising than Another?

Maxime DARRIN

Philippe Formont

Ismail Ben Ayed

Jackie Cheung

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning

Georg Pichler

Marco Romanelli

Leonardo Rey Vega

2024-08-01

IEEE Transactions on Dependable and Secure Computing (publié)

GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews

Maxime DARRIN

Ines Arous

Jackie Cheung

Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to confere… (voir plus)nces has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews. Unlike traditional consensus-based methods, \sys extracts both common and unique opinions from the reviews. We introduce novel uniqueness scores based on the Rational Speech Act framework to identify relevant sentences in the reviews. Our method aims to provide a pragmatic glimpse into all reviews, offering a balanced perspective on their opinions. Our experimental results with both automatic metrics and human evaluation show that \sys generates more discriminative summaries than baseline methods in terms of human evaluation while achieving comparable performance with these methods in terms of automatic metrics.

2024-06-11

ArXiv (prépublication)

When is an Embedding Model More Promising than Another?

Maxime DARRIN

Philippe Formont

Ismail Ben Ayed

Jackie Cheung

Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to p… (voir plus)erform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.

2024-06-11

ArXiv (prépublication)

Beyond the Norms: Detecting Prediction Errors in Regression Models

Andres Altieri

Marco Romanelli

Georg Pichler

Florence Alberge

This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g.… (voir plus), aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems.

2024-05-01

ICML.cc/2024/Conference (spotlight)

openreview.net

Is Meta-training Really Necessary for Molecular Few-Shot Learning ?

Philippe Formont

Hugo Jeannin

Ismail Ben Ayed

2024-04-02

ArXiv (prépublication)

Two-stage Multiple-Model Compression Approach for Sampled Electrical Signals

Corentin Presvôts

Michel Kieffer

Thibault Prevost

Patrick Panciatici

Zuxing Li

This paper presents a two-stage Multiple-Model Compression (MMC) approach for sampled electrical waveforms. To limit latency, the processing… (voir plus) is window-based, with a window length commensurate to the electrical period. For each window, the first stage compares several parametric models to get a coarse representation of the samples. The second stage then compares different residual compression techniques to minimize the norm of the reconstruction error. The allocation of the rate budget among the two stages is optimized. The proposed MMC approach provides better signal-to-noise ratios than state-of-the-art solutions on periodic and transient waveforms.

2024-03-19

Data Compression Conference (publié)