Vikas Verma

MixupE: Understanding and improving Mixup from directional derivative perspective

Vikas Verma

Yingtian Zou

Sarthak Mittal

Wai Hoh Tang

Hieu Pham

Juho Kannala

Yoshua Bengio

Arno Solin

Kenji Kawaguchi

2022-12-31

UAI (publié)

doi.org

proceedings.mlr.press

Supplementary Material for MixupE

Yingtian Zou

Vikas Verma

Sarthak Mittal

Wai Hoh Tang

Hieu Pham

Juho Kannala

Yoshua Bengio

Arno Solin

Kenji Kawaguchi

We denote by z = (x,y) the input and output pair where x ∈ X ⊆ R and y ∈ Y ⊆ R . Let fθ(x) ∈ R be the output of the logits (i.e.,… (voir plus) the last layer before the softmax or sigmoid) of the model parameterized by θ. We use l(θ, z) = h(fθ(x)) − yfθ(x) to denote the loss function. Let g(·) be the activation function. We use x(i) to index i-th element of the vector x and xj to represent j-th variable in a set. The notation list is:

2022-12-31

(publié)

www.semanticscholar.org

Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy

Alex Lamb

Vikas Verma

Kenji Kawaguchi

Juho Kannala

Alexander Matyasko

Yoshua Bengio

Savya Khosla

Adversarial robustness has become a central goal in deep learning, both in theory and in practice. However, successful methods to improve th… (voir plus)e adversarial robustness (such as adversarial training) greatly hurt generalization performance on the unperturbed data. This could have a major impact on how achieving adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve accuracy on the unperturbed data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases the standard test error (when there is no adversary) from 4.43% to 12.32%, whereas with our Interpolated adversarial training we retain adversarial robustness while achieving a standard test error of only 6.45%. With our technique, the relative increase in the standard error for the robust model is reduced from 178.1% to just 45.5%.

2022-09-30

Neural Networks (publié)

doi.org

arxiv.org

PatchUp: A Feature-Space Block-Level Regularization Technique for Convolutional Neural Networks

Mojtaba Faramarzi

M. Amini

Akilesh Badrinaaraayanan

Vikas Verma

A. Chandar

Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data… (voir plus). A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR10/100, SVHN, Tiny-ImageNet, and ImageNet using ResNet architectures including PreActResnet18/34, WRN-28-10, ResNet101/152 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide a better generalization to deformed samples and is more robust against adversarial attacks.

2022-06-27

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

Juho Kannala

We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to tra… (voir plus)in a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural network, without making any assumptions about the "aggregation" layer or the depth of the graph neural networks. We experimentally validate this analysis by applying GraphMix to various architectures such as Graph Convolutional Networks, Graph Attention Networks and Graph-U-Net. Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: Cora-Full, Co-author-CS and Co-author-Physics.

2020-10-10

AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

Towards an Unsupervised Method for Model Selection in Few-Shot Learning

Simon Guiroy

Vikas Verma

Christopher Pal

The study of generalization of neural networks in gradient-based meta-learning has recently great research interest. Previous work on the st… (voir plus)udy of the objective landscapes within the scope of few-shot classiﬁcation empirically demonstrated that generalization to new tasks might be linked to the average inner product between their respective gradients vectors (Guiroy et al., 2019). Following that work, we study the effect that meta-training has on the learned space of representation of the network. Notably, we demonstrate that the global similarity in the space of representation, measured by the average inner product between the embeddings of meta-test examples, also correlates to generalization. Based on these observations, we propose a novel model-selection criterion for gradient-based meta-learning and experimentally validate its effectiveness.

2020-07-12

ICML.cc/2020/Workshop/LifelongML (inconnu)

openreview.net

GraphMix: Improved Training of Graph Neural Networks for Semi-Supervised Learning

Juho Kannala

We present GraphMix , a regularized training scheme for Graph Neural Network based semi-supervised object classiﬁcation, leveraging the re… (voir plus)cent advances in the regularization of classical deep neural networks. Speciﬁcally, we pro-pose a uniﬁed approach in which we train a fully-connected network jointly with the graph neural network via parameter sharing, interpolation-based regularization and self-predicted-targets. Our proposed method is architecture agnostic in the sense that it can be applied to any variant of graph neural networks which applies a parametric transformation to the features of the graph nodes. Despite its simplicity, with GraphMix we can consistently improve results and achieve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets :Cora-Full, Co-author-CS and Co-author-Physics.

2019-12-31

(publié)

www.semanticscholar.org

Interpolation Consistency Training for Semi-Supervised Learning

Juho Kannala

David Lopez-Paz

Arno Solin

2019-08-09

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

Towards Understanding Generalization in Gradient-Based Meta-Learning

Simon Guiroy

Vikas Verma

Christopher Pal

In this work we study generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective lan… (voir plus)dscapes. We experimentally demonstrate that as meta-training progresses, the meta-test solutions, obtained after adapting the meta-train solution of the model, to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further away from the meta-train solution. We also show that those meta-test solutions become flatter even as generalization starts to degrade, thus providing an experimental evidence against the correlation between generalization and flat minima in the paradigm of gradient-based meta-leaning. Furthermore, we provide empirical evidence that generalization to new tasks is correlated with the coherence between their adaptation trajectories in parameter space, measured by the average cosine similarity between task-specific trajectory directions, starting from a same meta-train solution. We also show that coherence of meta-test gradients, measured by the average inner product between the task-specific gradient vectors evaluated at meta-train solution, is also correlated with generalization. Based on these observations, we propose a novel regularizer for MAML and provide experimental evidence for its effectiveness.

2019-07-15

ArXiv (prépublication)

openreview.net

On Adversarial Mixup Resynthesis

R Devon Hjelm

Christopher Pal

In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore mo… (voir plus)dels that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research.

2018-12-31

NeurIPS (publié)

dblp.uni-trier.de

Adversarial Mixup Resynthesizers

R Devon Hjelm

Christopher Pal

In this paper, we explore new approaches to combining information encoded within the learned representations of autoencoders. We explore mod… (voir plus)els that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research.

2018-12-31

DGS@ICLR (publié)

openreview.net

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy

Alex Lamb

Vikas Verma

Juho Kannala

Yoshua Bengio

Adversarial robustness has become a central goal in deep learning, both in theory and practice. However, successful methods to improve adver… (voir plus)sarial robustness (such as adversarial training) greatly hurt generalization performance on the clean data. This could have a major impact on how adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve performance on the clean data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases clean test error from 5.8% to 16.7%, whereas with our Interpolated adversarial training we retain adversarial robustness while achieving a clean test error of only 6.5%. With our technique, the relative error increase for the robust model is reduced from 187.9% to just 12.1%.

2018-12-31

arXiv.org (prépublication)

dblp.uni-trier.de

Mila Techaide 2026

Désinformation 2.0 : quand l’IA brouille nos ondes

Avantage IA : productivité dans la fonction publique

Vikas Verma

Publications

Mila Techaide 2026

Désinformation 2.0 : quand l’IA brouille nos ondes

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Vikas Verma

Publications