Kenji Kawaguchi

Discrete-Valued Neural Communication in Structured Architectures Enhances Generalization

Dianbo Liu

Chen Sun

Michael C. Mozer

Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed… (voir plus) of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a "cat" is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.

2019-12-31

International Conference on Machine Learning (publié)

doi.org

openreview.net

Depth with Nonlinearity Creates No Bad Local Minima in ResNets

Kenji Kawaguchi

Yoshua Bengio

2019-09-30

Neural Networks (publié)

doi.org

arxiv.org

Interpolation Consistency Training for Semi-Supervised Learning

Juho Kannala

David Lopez-Paz

Arno Solin

2019-08-09

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

Generalization in Machine Learning via Analytical Learning Theory

Kenji Kawaguchi

Yoshua Bengio

This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (voir plus)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.

2018-02-20

arXiv.org (prépublication)

dblp.uni-trier.de

Towards Understanding Generalization via Analytical Learning Theory

Kenji Kawaguchi

Yoshua Bengio

This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (voir plus)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.

2018-02-20

ArXiv (prépublication)

arxiv.org

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Kenji Kawaguchi

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Kenji Kawaguchi

Publications