Pablo Piantanida

Associate Academic Member

Full Professor, Université Paris-Saclay

Director, International Laboratory on Learning Systems (ILLS), McGill University

Associate professor, École de technologie supérieure (ETS), Department of Systems Engineering

Research Topics

AI Safety

Information Theory

Machine Learning Theory

Natural Language Processing

Website

Google Scholar

Biography

I am a professor at CentraleSupélec (Université Paris-Saclay) with the French National Centre for Scientific Research (CNRS), and Director of the International Laboratory on Learning Systems (ILLS) which gathers McGill University, École de technologie supérieure (ÉTS), Mila – Quebec AI Institute, France’s Centre Nationale de la Recherche Scientifique (CNRS), Université Paris-Saclay, and the École CentraleSupélec.

My research revolves around the application of advanced statistical and information-theoretic techniques to the field of machine learning. I am interested in developing rigorous techniques based on information measures and concepts for building safe and trustworthy AI systems and establishing confidence in their behavior and robustness, thereby securing their use in society. My primary areas of expertise include information theory, information geometry, learning theory, privacy, fairness, with applications to computer vision and natural language processing.

I obtained my undergraduate education at the University of Buenos Aires and pursued graduate studies in applied mathematics at Paris-Saclay University in France. Throughout my career, I have also held visiting positions at INRIA, Université de Montréal and Ecole de Technologie Supérieure (ÉTS), among others.

My earlier research encompassed the fields of information theory beyond distributed compression, statistical decision, universal source coding, cooperation, feedback, index coding, key generation, security, and privacy, among others.

I teach courses on machine learning, information theory and deep learning, covering topics such as statistical learning theory, information measures, statistical principles of neural networks.

Current Students

Chataigner Cléa

PhD - McGill University

Principal supervisor :

Golnoosh Farnadi

Maxime Darrin

PhD - McGill University

Principal supervisor :

PhD

Collaborating researcher - Sorbonne université

Philippe Formont

PhD - École de technologie suprérieure

Github

Google Scholar

Langlois Henri

Master's Research - Paris-Saclay University

Co-supervisor :

Jackie Cheung

Github

Fanny JOURDAN

Postdoctorate - École de technologie suprérieure

Co-supervisor :

Collaborating researcher - University of Toulon

Co-supervisor :

PhD - McGill University

Principal supervisor :

Mark Coates

Matteo Sammut

PhD - Université Paris Dauphine-PSL

Github

Nicolas Thome

Collaborating researcher - Sorbonne Université

Website

Google Scholar

Publications

Multiple-model coding scheme for electrical signal compression

Corentin Presvôts

Michel Kieffer

Thibault Prevost

Patrick Panciatici

Zuxing Li

Pablo Piantanida

2025-04-01

Signal Processing (published)

doi.org

Multiple-model coding scheme for electrical signal compression

Corentin Presvôts

Michel Kieffer

Thibault Prevost

Patrick Panciatici

Zuxing Li

Pablo Piantanida

2025-04-01

Signal Processing (published)

doi.org

A Strong Baseline for Molecular Few-Shot Learning

Philippe Formont

Hugo Jeannin

Pablo Piantanida

Ismail Ben Ayed

Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving con… (see more)voluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.

2025-02-15

TMLR (accepted)

openreview.net

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

Eric Aubinais

Philippe Formont

Pablo Piantanida

Elisabeth Gassiat

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance l… (see more)evels comparable to the original models. In this work, we investigate the impact of quantization procedures on the privacy of data-driven models, specifically focusing on their vulnerability to membership inference attacks. We derive an asymptotic theoretical analysis of Membership Inference Security (MIS), characterizing the privacy implications of quantized algorithm weights against the most powerful (and possibly unknown) attacks. Building on these theoretical insights, we propose a novel methodology to empirically assess and rank the privacy levels of various quantization procedures. Using synthetic datasets, we demonstrate the effectiveness of our approach in assessing the MIS of different quantizers. Furthermore, we explore the trade-off between privacy and performance using real-world data and models in the context of molecular modeling.

2025-02-10

ArXiv (preprint)

arxiv.org

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

Eric Aubinais

Philippe Formont

Pablo Piantanida

Elisabeth Gassiat

2025-02-10

ArXiv (preprint)

doi.org

arxiv.org

When is an Embedding Model More Promising than Another?

Ismail Ben Ayed

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning

Georg Pichler

Marco Romanelli

Leonardo Rey Vega

Pablo Piantanida

2024-08-01

IEEE Transactions on Dependable and Secure Computing (published)

doi.org

arxiv.org

GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews

Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to confere… (see more)nces has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews. Unlike traditional consensus-based methods, \sys extracts both common and unique opinions from the reviews. We introduce novel uniqueness scores based on the Rational Speech Act framework to identify relevant sentences in the reviews. Our method aims to provide a pragmatic glimpse into all reviews, offering a balanced perspective on their opinions. Our experimental results with both automatic metrics and human evaluation show that \sys generates more discriminative summaries than baseline methods in terms of human evaluation while achieving comparable performance with these methods in terms of automatic metrics.

2024-06-11

ArXiv (preprint)

doi.org

arxiv.org

When is an Embedding Model More Promising than Another?

Ismail Ben Ayed

Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to p… (see more)erform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.

2024-06-11

ArXiv (preprint)

doi.org

arxiv.org

Beyond the Norms: Detecting Prediction Errors in Regression Models

Andres Altieri

Marco Romanelli

Georg Pichler

Florence Alberge

Pablo Piantanida

This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g.… (see more), aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems.

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Is Meta-training Really Necessary for Molecular Few-Shot Learning ?

Philippe Formont

Hugo Jeannin

Pablo Piantanida

Ismail Ben Ayed

2024-04-02

ArXiv (preprint)

doi.org

arxiv.org

Two-stage Multiple-Model Compression Approach for Sampled Electrical Signals

Corentin Presvôts

Michel Kieffer

Thibault Prevost

Patrick Panciatici

Zuxing Li

Pablo Piantanida

This paper presents a two-stage Multiple-Model Compression (MMC) approach for sampled electrical waveforms. To limit latency, the processing… (see more) is window-based, with a window length commensurate to the electrical period. For each window, the first stage compares several parametric models to get a coarse representation of the samples. The second stage then compares different residual compression techniques to minimize the norm of the reconstruction error. The allocation of the rate budget among the two stages is optimized. The proposed MMC approach provides better signal-to-noise ratios than state-of-the-art solutions on periodic and transient waveforms.

2024-03-19

Data Compression Conference (published)

doi.org

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Pablo Piantanida

Biography

Current Students

Publications

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Popular keywords:

Pablo Piantanida

Biography

Current Students

Publications