Portrait of Pablo Piantanida

Pablo Piantanida

Associate Academic Member
Full Professor, Université Paris-Saclay
Director, International Laboratory on Learning Systems (ILLS), McGill University
Associate professor, École de technologie supérieure (ETS), Department of Systems Engineering
Research Topics
AI Safety
Information Theory
Machine Learning Theory
Natural Language Processing

Biography

I am a professor at CentraleSupélec (Université Paris-Saclay) with the French National Centre for Scientific Research (CNRS), and Director of the International Laboratory on Learning Systems (ILLS) which gathers McGill University, École de technologie supérieure (ÉTS), Mila – Quebec AI Institute, France’s Centre Nationale de la Recherche Scientifique (CNRS), Université Paris-Saclay, and the École CentraleSupélec.

My research revolves around the application of advanced statistical and information-theoretic techniques to the field of machine learning. I am interested in developing rigorous techniques based on information measures and concepts for building safe and trustworthy AI systems and establishing confidence in their behavior and robustness, thereby securing their use in society. My primary areas of expertise include information theory, information geometry, learning theory, privacy, fairness, with applications to computer vision and natural language processing.

I obtained my undergraduate education at the University of Buenos Aires and pursued graduate studies in applied mathematics at Paris-Saclay University in France. Throughout my career, I have also held visiting positions at INRIA, Université de Montréal and Ecole de Technologie Supérieure (ÉTS), among others.

My earlier research encompassed the fields of information theory beyond distributed compression, statistical decision, universal source coding, cooperation, feedback, index coding, key generation, security, and privacy, among others.

I teach courses on machine learning, information theory and deep learning, covering topics such as statistical learning theory, information measures, statistical principles of neural networks.

Current Students

PhD - McGill University
Principal supervisor :
PhD - École de technologie suprérieure

Publications

A Strong Baseline for Molecular Few-Shot Learning
Philippe Formont
Hugo Jeannin
Ismail Ben Ayed
Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving con… (see more)voluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Elisabeth Gassiat
Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance l… (see more)evels comparable to the original models. In this work, we investigate the impact of quantization procedures on the privacy of data-driven models, specifically focusing on their vulnerability to membership inference attacks. We derive an asymptotic theoretical analysis of Membership Inference Security (MIS), characterizing the privacy implications of quantized algorithm weights against the most powerful (and possibly unknown) attacks. Building on these theoretical insights, we propose a novel methodology to empirically assess and rank the privacy levels of various quantization procedures. Using synthetic datasets, we demonstrate the effectiveness of our approach in assessing the MIS of different quantizers. Furthermore, we explore the trade-off between privacy and performance using real-world data and models in the context of molecular modeling.
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Elisabeth Gassiat
Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance l… (see more)evels comparable to the original models. In this work, we investigate the impact of quantization procedures on the privacy of data-driven models, specifically focusing on their vulnerability to membership inference attacks. We derive an asymptotic theoretical analysis of Membership Inference Security (MIS), characterizing the privacy implications of quantized algorithm weights against the most powerful (and possibly unknown) attacks. Building on these theoretical insights, we propose a novel methodology to empirically assess and rank the privacy levels of various quantization procedures. Using synthetic datasets, we demonstrate the effectiveness of our approach in assessing the MIS of different quantizers. Furthermore, we explore the trade-off between privacy and performance using real-world data and models in the context of molecular modeling.
When is an Embedding Model More Promising than Another?
Maxime DARRIN
Philippe Formont
Ismail Ben Ayed
Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning
Georg Pichler
Marco Romanelli
Leonardo Rey Vega
GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews
Maxime DARRIN
Ines Arous
Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to confere… (see more)nces has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews. Unlike traditional consensus-based methods, \sys extracts both common and unique opinions from the reviews. We introduce novel uniqueness scores based on the Rational Speech Act framework to identify relevant sentences in the reviews. Our method aims to provide a pragmatic glimpse into all reviews, offering a balanced perspective on their opinions. Our experimental results with both automatic metrics and human evaluation show that \sys generates more discriminative summaries than baseline methods in terms of human evaluation while achieving comparable performance with these methods in terms of automatic metrics.
When is an Embedding Model More Promising than Another?
Maxime DARRIN
Philippe Formont
Ismail Ben Ayed
Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to p… (see more)erform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.
Beyond the Norms: Detecting Prediction Errors in Regression Models
Andres Altieri
Marco Romanelli
Georg Pichler
Florence Alberge
This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g.… (see more), aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems.
Is Meta-training Really Necessary for Molecular Few-Shot Learning ?
Philippe Formont
Hugo Jeannin
Ismail Ben Ayed
Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving con… (see more)voluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.
Two-stage Multiple-Model Compression Approach for Sampled Electrical Signals
Corentin Presvôts
Michel Kieffer
Thibault Prevost
Patrick Panciatici
Zuxing Li
This paper presents a two-stage Multiple-Model Compression (MMC) approach for sampled electrical waveforms. To limit latency, the processing… (see more) is window-based, with a window length commensurate to the electrical period. For each window, the first stage compares several parametric models to get a coarse representation of the samples. The second stage then compares different residual compression techniques to minimize the norm of the reconstruction error. The allocation of the rate budget among the two stages is optimized. The proposed MMC approach provides better signal-to-noise ratios than state-of-the-art solutions on periodic and transient waveforms.
COSMIC: Mutual Information for Task-Agnostic Summarization Evaluation
Maxime DARRIN
Philippe Formont
Jackie Chi Kit Cheung
Assessing the quality of summarizers poses significant challenges. In response, we propose a novel task-oriented evaluation approach that as… (see more)sesses summarizers based on their capacity to produce summaries that are useful for downstream tasks, while preserving task outcomes. We theoretically establish a direct relationship between the resulting error probability of these tasks and the mutual information between source texts and generated summaries. We introduce
A Data-Driven Measure of Relative Uncertainty for Misclassification Detection
Eduardo Dadalto Câmara Gomes
Marco Romanelli
Georg Pichler