Portrait of Pablo Piantanida

Pablo Piantanida

Associate Academic Member
Full Professor, Université Paris-Saclay
Director, International Laboratory on Learning Systems (ILLS), McGill University
Associate professor, École de technologie supérieure (ETS), Department of Systems Engineering
Research Topics
AI Safety
Information Theory
Machine Learning Theory
Natural Language Processing

Biography

I am a professor at CentraleSupélec (Université Paris-Saclay) with the French National Centre for Scientific Research (CNRS), and Director of the International Laboratory on Learning Systems (ILLS) which gathers McGill University, École de technologie supérieure (ÉTS), Mila – Quebec AI Institute, France’s Centre Nationale de la Recherche Scientifique (CNRS), Université Paris-Saclay, and the École CentraleSupélec.

My research revolves around the application of advanced statistical and information-theoretic techniques to the field of machine learning. I am interested in developing rigorous techniques based on information measures and concepts for building safe and trustworthy AI systems and establishing confidence in their behavior and robustness, thereby securing their use in society. My primary areas of expertise include information theory, information geometry, learning theory, privacy, fairness, with applications to computer vision and natural language processing.

I obtained my undergraduate education at the University of Buenos Aires and pursued graduate studies in applied mathematics at Paris-Saclay University in France. Throughout my career, I have also held visiting positions at INRIA, Université de Montréal and Ecole de Technologie Supérieure (ÉTS), among others.

My earlier research encompassed the fields of information theory beyond distributed compression, statistical decision, universal source coding, cooperation, feedback, index coding, key generation, security, and privacy, among others.

I teach courses on machine learning, information theory and deep learning, covering topics such as statistical learning theory, information measures, statistical principles of neural networks.

Current Students

Independent visiting researcher - Université Paris-Saclay
PhD - McGill University
Principal supervisor :

Publications

Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation
Nuno M. Guerreiro
Pierre Colombo
André Martins
Neural machine translation (NMT) has become the de-facto standard in real-world machine translation applications. However, NMT models can un… (see more)predictably produce severely pathological translations, known as hallucinations, that seriously undermine user trust. It becomes thus crucial to implement effective preventive strategies to guarantee their proper functioning. In this paper, we address the problem of hallucination detection in NMT by following a simple intuition: as hallucinations are detached from the source content, they exhibit encoder-decoder attention patterns that are statistically different from those of good quality translations. We frame this problem with an optimal transport formulation and propose a fully unsupervised, plug-in detector that can be used with any attention-based NMT model. Experimental results show that our detector not only outperforms all previous model-based detectors, but is also competitive with detectors that employ external models trained on millions of samples for related tasks such as quality estimation and cross-lingual sentence similarity.
On the (Im)Possibility of Estimating Various Notions of Differential Privacy (short paper)
Daniele Gorla
Louis Jalouzot
Federica Granese
Catuscia Palamidessi
We analyze to what extent final users can infer information about the level of protection of their data when the data obfuscation mechanism … (see more)is a priori unknown to them (the so-called “black-box" scenario). In particular, we delve into the investigation of two notions of local differential privacy (LDP), namely 𝜀 -LDP and Rényi LDP. On one hand, we prove that, without any assumption on the underlying distributions, it is not possible to have an algorithm able to infer the level of data protection with provable guarantees. On the other hand, we demonstrate that, under reasonable assumptions (namely, Lipschitzness of the involved densities on a closed interval), such guarantees exist and can be achieved by a simple histogram-based estimator.
Beyond Mahalanobis-Based Scores for Textual OOD Detection
Pierre Colombo
Eduardo Dadalto Câmara Gomes
Guillaume Staerman
Nathan Noiry
Beyond Mahalanobis Distance for Textual OOD Detection
Pierre Colombo
Eduardo Dadalto Câmara Gomes
Guillaume Staerman
Nathan Noiry
KNIFE: Kernelized-Neural Differential Entropy Estimation
Georg Pichler
Pierre Colombo
Malik Boudiaf
Gunther Koliander
Mutual Information (MI) has been widely used as a loss regularizer for training neural networks. This has been particularly effective when l… (see more)earn dis-entangled or compressed representations of high dimensional data. However, differential entropy (DE), another fundamental measure of information, has not found widespread use in neural network training. Although DE offers a potentially wider range of applications than MI, off-the-shelf DE estimators are either non differentiable, computationally intractable or fail to adapt to changes in the underlying distribution. These drawbacks prevent them from being used as regularizers in neural networks training. To address shortcomings in previously proposed estimators for DE, here we introduce K NIFE , a fully parameterized, differentiable kernel-based estimator of DE. The flexibility of our approach also allows us to construct K NIFE -based estimators for conditional (on either discrete or continuous variables) DE, as well as MI. We empirically validate our method on high-dimensional synthetic data and further apply it to guide the training of neural networks for real-world tasks. Our experiments on a large variety of tasks, including visual domain adaptation, textual fair classification, and textual fine-tuning demonstrate the effectiveness of K NIFE - based estimation. Code can be found at https: //github.com/g-pichler/knife .
Realistic Evaluation of Transductive Few-Shot Learning - Supplementary Material
Olivier Veilleux
Éts Montréal
Malik Boudiaf
Ismail Ben
Ayed Éts Montreal
In the main tables of the paper, we did not include the performances of α-TIM in the standard balanced setting. Here, we emphasize that α-… (see more)TIM is a generalization of TIM [1] as when α → 1 (i.e., the α-entropies tend to the Shannon entropies), α-TIM tends to TIM. Therefore, in the standard setting, where optimal hyper-parameter α is obtained over validation tasks that are balanced (as in the standard validation tasks of the original TIM and the other existing methods), the performance of α-TIM is the same as TIM. When α is tuned on balanced validation tasks, we obtain an optimal value of α very close to 1, and our α-mutual information approaches the standard mutual information. When the validation tasks are uniformly random, as in our new setting and in the validation plots we provided in the main figure, one can see that the performance of α-TIM remains competitive when we tend to balanced testing tasks (i.e., when a is increasing), but is significantly better than TIM when we tend to uniformly-random testing tasks (a = 1). These results illustrate the flexibility of α-divergences, and are in line with the technical analysis provided in the main paper.
Learning Anonymized Representations with Adversarial Neural Networks
Clément Feutry
P. Duhamel
Statistical methods protecting sensitive information or the identity of the data owner have become critical to ensure privacy of individuals… (see more) as well as of organizations. This paper investigates anonymization methods based on representation learning and deep neural networks, and motivated by novel information theoretical bounds. We introduce a novel training objective for simultaneously training a predictor over target variables of interest (the regular labels) while preventing an intermediate representation to be predictive of the private labels. The architecture is based on three sub-networks: one going from input to representation, one from representation to predicted regular labels, and one from representation to predicted private labels. The training procedure aims at learning representations that preserve the relevant part of the information (about regular labels) while dismissing information about the private labels which correspond to the identity of a person. We demonstrate the success of this approach for two distinct classification versus anonymization tasks (handwritten digits and sentiment analysis).