Portrait of Alex Lamb is unavailable

Alex Lamb

Alumni

Publications

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers
A. Slowik
Michael Curtis Mozer
Philippe Beaudoin
Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (see more)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.
Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
Philippe Beaudoin
Charles Blundell
Sergey Levine
Michael Curtis Mozer
Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers
A. Slowik
Michael Curtis Mozer
Philippe Beaudoin
Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (see more)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.
GraphMix: Improved Training of GNNs for Semi-Supervised Learning
We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to tra… (see more)in a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural network, without making any assumptions about the "aggregation" layer or the depth of the graph neural networks. We experimentally validate this analysis by applying GraphMix to various architectures such as Graph Convolutional Networks, Graph Attention Networks and Graph-U-Net. Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: Cora-Full, Co-author-CS and Co-author-Physics.
Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems
Philippe Beaudoin
Sergey Levine
Charles Blundell
Michael Curtis Mozer
GraphMix: Improved Training of Graph Neural Networks for Semi-Supervised Learning
We present GraphMix , a regularized training scheme for Graph Neural Network based semi-supervised object classification, leveraging the re… (see more)cent advances in the regularization of classical deep neural networks. Specifically, we pro-pose a unified approach in which we train a fully-connected network jointly with the graph neural network via parameter sharing, interpolation-based regularization and self-predicted-targets. Our proposed method is architecture agnostic in the sense that it can be applied to any variant of graph neural networks which applies a parametric transformation to the features of the graph nodes. Despite its simplicity, with GraphMix we can consistently improve results and achieve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets :Cora-Full, Co-author-CS and Co-author-Physics.
Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules
Interpolation Consistency Training for Semi-Supervised Learning
Juho Kannala
David Lopez-Paz
Arno Solin
Manifold Mixup: Better Representations by Interpolating Hidden States
Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly d… (see more)ifferent test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. Manifold Mixup leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with Manifold Mixup learn class-representations with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it on practical situations, and connect it to previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, Manifold Mixup improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.
State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations
Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches … (see more)is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We introduce a method, which we refer to as \emph{state reification}, that involves modeling the distribution of hidden states over the training data and then projecting hidden states observed during testing toward this distribution. Our intuition is that if the network can remain in a familiar manifold of hidden space, subsequent layers of the net should be well trained to respond appropriately. We show that this state-reification method helps neural nets to generalize better, especially when labeled data are sparse, and also helps overcome the challenge of achieving robust generalization with adversarial training.
Interpolation Consistency Training for Semi-Supervised Learning
Juho Kannala
David Lopez-Paz
On Adversarial Mixup Resynthesis
In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore mo… (see more)dels that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research.