Portrait de Pablo Piantanida

Pablo Piantanida

Membre académique associé
Professeur titulaire, Université Paris-Saclay
Directeur, Laboratoire international sur les systèmes d'apprentissage (ILLS), McGill University
Professeur associé, École de technologie supérieure (ETS), Département de génie des systèmes
Sujets de recherche
Sécurité de l'IA
Théorie de l'apprentissage automatique
Théorie de l'information
Traitement du langage naturel

Biographie

Je suis professeur au CentraleSupélec de l'Université Paris-Saclay avec le Centre national français de la recherche scientifique (CNRS), et directeur du Laboratoire international sur les systèmes d'apprentissage (ILLS) qui regroupe l'Université McGill, l'École de technologie supérieure (ÉTS), Mila - Institut québécois d'intelligence artificielle, le Centre national français de la recherche scientifique (CNRS), l'Université Paris-Saclay et l'École CentraleSupélec.

Mes recherches portent sur l'application de techniques statistiques et de théorie de l'information avancées au domaine de l'apprentissage automatique. Je m'intéresse au développement de techniques rigoureuses basées sur des mesures et des concepts d'information pour construire des systèmes d'IA sûrs et fiables et établir la confiance dans leur comportement et leur robustesse, sécurisant ainsi leur utilisation dans la société. Mes principaux domaines d'expertise sont la théorie de l'information, la géométrie de l'information, la théorie de l'apprentissage, la protection de la vie privée, l'équité, avec des applications à la vision par ordinateur et au traitement du langage naturel.

J'ai fait mes études de premier cycle à l'université de Buenos Aires et j'ai poursuivi des études supérieures en mathématiques appliquées à l'université Paris-Saclay en France. Tout au long de ma carrière, j'ai également occupé des postes d'invité à l'INRIA, à l'Université de Montréal et à l'École de technologie supérieure (ÉTS), entre autres.

Mes recherches antérieures ont porté sur les domaines de la théorie de l'information au-delà de la compression distribuée, de la décision statistique, du codage universel des sources, de la coopération, de la rétroaction, du codage d'index, de la génération de clés, de la sécurité et de la protection des données.

Je donne des cours sur l'apprentissage automatique, la théorie de l'information et l'apprentissage profond, couvrant des sujets tels que la théorie de l'apprentissage statistique, les mesures de l'information, les principes statistiques des réseaux neuronaux.

Étudiants actuels

Doctorat - McGill
Superviseur⋅e principal⋅e :

Publications

A Strong Baseline for Molecular Few-Shot Learning
Philippe Formont
Hugo Jeannin
Ismail Ben Ayed
Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving con… (voir plus)voluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Elisabeth Gassiat
Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance l… (voir plus)evels comparable to the original models. In this work, we investigate the impact of quantization procedures on the privacy of data-driven models, specifically focusing on their vulnerability to membership inference attacks. We derive an asymptotic theoretical analysis of Membership Inference Security (MIS), characterizing the privacy implications of quantized algorithm weights against the most powerful (and possibly unknown) attacks. Building on these theoretical insights, we propose a novel methodology to empirically assess and rank the privacy levels of various quantization procedures. Using synthetic datasets, we demonstrate the effectiveness of our approach in assessing the MIS of different quantizers. Furthermore, we explore the trade-off between privacy and performance using real-world data and models in the context of molecular modeling.
When is an Embedding Model More Promising than Another?
Maxime DARRIN
Philippe Formont
Ismail Ben Ayed
Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning
Georg Pichler
Marco Romanelli
Leonardo Rey Vega
GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews
Maxime DARRIN
Ines Arous
Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to confere… (voir plus)nces has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews. Unlike traditional consensus-based methods, \sys extracts both common and unique opinions from the reviews. We introduce novel uniqueness scores based on the Rational Speech Act framework to identify relevant sentences in the reviews. Our method aims to provide a pragmatic glimpse into all reviews, offering a balanced perspective on their opinions. Our experimental results with both automatic metrics and human evaluation show that \sys generates more discriminative summaries than baseline methods in terms of human evaluation while achieving comparable performance with these methods in terms of automatic metrics.
When is an Embedding Model More Promising than Another?
Maxime DARRIN
Philippe Formont
Ismail Ben Ayed
Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to p… (voir plus)erform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.
Beyond the Norms: Detecting Prediction Errors in Regression Models
Andres Altieri
Marco Romanelli
Georg Pichler
Florence Alberge
This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g.… (voir plus), aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems.
Is Meta-training Really Necessary for Molecular Few-Shot Learning ?
Philippe Formont
Hugo Jeannin
Ismail Ben Ayed
Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving con… (voir plus)voluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.
Two-stage Multiple-Model Compression Approach for Sampled Electrical Signals
Corentin Presvôts
Michel Kieffer
Thibault Prevost
Patrick Panciatici
Zuxing Li
This paper presents a two-stage Multiple-Model Compression (MMC) approach for sampled electrical waveforms. To limit latency, the processing… (voir plus) is window-based, with a window length commensurate to the electrical period. For each window, the first stage compares several parametric models to get a coarse representation of the samples. The second stage then compares different residual compression techniques to minimize the norm of the reconstruction error. The allocation of the rate budget among the two stages is optimized. The proposed MMC approach provides better signal-to-noise ratios than state-of-the-art solutions on periodic and transient waveforms.
COSMIC: Mutual Information for Task-Agnostic Summarization Evaluation
Maxime DARRIN
Philippe Formont
Jackie Chi Kit Cheung
Assessing the quality of summarizers poses significant challenges. In response, we propose a novel task-oriented evaluation approach that as… (voir plus)sesses summarizers based on their capacity to produce summaries that are useful for downstream tasks, while preserving task outcomes. We theoretically establish a direct relationship between the resulting error probability of these tasks and the mutual information between source texts and generated summaries. We introduce
A Data-Driven Measure of Relative Uncertainty for Misclassification Detection
Eduardo Dadalto Câmara Gomes
Marco Romanelli
Georg Pichler
On the Stability of a non-hyperbolic nonlinear map with non-bounded set of non-isolated fixed points with applications to Machine Learning
Roberta Hansen
Matias Vera
Lautaro Estienne
LUCIANA FERRER