Aristide Baratin

CrossSplit: Mitigating Label Noise Memorization through Data Splitting

Jihye Kim

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label cor… (see more)rection and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labelled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios.

2022-12-03

ArXiv (preprint)

doi.org

arxiv.org

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

Yuchen Lu

Romain Laroche

2022-06-02

ArXiv (preprint)

arxiv.org

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

Yuchen Lu

Romain Laroche

2022-06-02

ArXiv (preprint)

arxiv.org

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Thomas George

Guillaume Lajoie

Aristide Baratin

Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called `laz… (see more)y' training regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of easy-to-learn spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty.

2022-01-01

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

Implicit Regularization in Deep Learning: A View from Function Space

We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a possible regularization eff… (see more)ect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. By extrapolating a new analysis of Rademacher complexity bounds in linear models, we propose and study a new heuristic complexity measure for neural networks which captures this phenomenon, in terms of sequences of tangent kernel classes along in the learning trajectories.

2020-08-03

ArXiv (preprint)

arxiv.org

Implicit Regularization in Deep Learning: A View from Function Space

2020-08-03

ArXiv (preprint)

arxiv.org

Mutual Information Neural Estimation

Ishmael Belghazi

Sai Rajeswar

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent … (see more)over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.

2018-07-03

Proceedings of the 35th International Conference on Machine Learning (published)

proceedings.mlr.press

arxiv.org

On the Spectral Bias of Deep Neural Networks

Nasim Rahaman

Felix Draxler

Fred Hamprecht

It is well known that over-parametrized deep neural networks (DNNs) are an overly expressive class of functions that can memorize even rando… (see more)m data with

2018-06-22

arXiv.org (preprint)

dblp.uni-trier.de

On the Spectral Bias of Neural Networks

Nasim Rahaman

Felix Draxler

Fred Hamprecht

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with …

2018-06-22

ArXiv (preprint)

arxiv.org

MINE: Mutual Information Neural Estimation

Ishmael Belghazi

Sai Rajeswar

Aristide Baratin

(Rex) Devon Hjelm

Aaron Courville

This paper presents a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size. MINE… (see more) is back-propable and we prove that it is strongly consistent. We illustrate a handful of applications in which MINE is succesfully applied to enhance the property of generative models in both unsupervised and supervised settings. We apply our framework to estimate the information bottleneck, and apply it in tasks related to supervised classification problems. Our results demonstrate substantial added flexibility and improvement in these settings.

2018-05-04

ICLR.cc/2018/Conference (unknown)

openreview.net

A3T: Adversarially Augmented Adversarial Training

Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations o… (see more)f the input data purposely designed to fool a machine learning classifier. Most classification models, including deep learning models, are highly vulnerable to adversarial attacks. In this work, we investigate a procedure to improve adversarial robustness of deep neural networks through enforcing representation invariance. The idea is to train the classifier jointly with a discriminator attached to one of its hidden layer and trained to filter the adversarial noise. We perform preliminary experiments to test the viability of the approach and to compare it to other standard adversarial training methods.

2018-01-12

ArXiv (preprint)

arxiv.org

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Aristide Baratin

Publications

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Popular keywords:

Aristide Baratin

Publications