Publications

Towards Understanding Generalization via Analytical Learning Theory
This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this the… (voir plus)ory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.
Brain Tumor Segmentation Using a 3D FCN with Multi-scale Loss
Andrew Jesson
Lesion Detection, Segmentation and Prediction in Multiple Sclerosis Clinical Trials
Andrew Doyle
Colm Elliott
Zahra Karimaghaloo
Nagesh K. Subbanna
Douglas Arnold
Boundary Seeking GANs
R Devon Hjelm
Athul Jacob
Adam Trischler
Gerry Che
Combining Model-based and Model-free RL via Multi-step Control Variates
Tong Che
Yuchen Lu
George Tucker
Surya Bhupatiraju
Shane Gu
Sergey Levine
Existence of Nash Equilibria on Integer Programming Games
Andrea Lodi
João Pedro Pedroso
Learning Generative Models with Locally Disentangled Latent Factors
Online Hyper-Parameter Optimization
Damien Vincent
Sylvain Gelly
Nicolas Roux
Olivier Bousquet
Sequential Coordination of Deep Models for Learning Visual Arithmetic
Achieving machine intelligence requires a smooth integration of perception and reasoning, yet models developed to date tend to specialize in… (voir plus) one or the other; sophisticated manipulation of symbols acquired from rich perceptual spaces has so far proved elusive. Consider a visual arithmetic task, where the goal is to carry out simple arithmetical algorithms on digits presented under natural conditions (e.g. hand-written, placed randomly). We propose a two-tiered architecture for tackling this problem. The lower tier consists of a heterogeneous collection of information processing modules, which can include pre-trained deep neural networks for locating and extracting characters from the image, as well as modules performing symbolic transformations on the representations extracted by perception. The higher tier consists of a controller, trained using reinforcement learning, which coordinates the modules in order to solve the high-level task. For instance, the controller may learn in what contexts to execute the perceptual networks and what symbolic transformations to apply to their outputs. The resulting model is able to solve a variety of tasks in the visual arithmetic domain, and has several advantages over standard, architecturally homogeneous feedforward networks including improved sample efficiency.
Finding Flatter Minima with SGD
Stanisław Jastrzębski
Amos Storkey
Graph Priors for Deep Neural Networks
In this work we explore how gene-gene interaction graphs can be used as a prior for the representation of a model to construct features base… (voir plus)d on known interactions between genes. Most existing machine learning work on graphs focuses on building models when data is confined to a graph structure. In this work we focus on using the information from a graph to build better representations in our models. We use the percolate task, determining if a path exists across a grid for a set of node values, as a proxy for gene pathways. We create variants of the percolate task to explore where existing methods fail. We test the limits of existing methods in order to determine what can be improved when applying these methods to a real task. This leads us to propose new methods based on Graph Convolutional Networks (GCN) that use pooling and dropout to deal with noise in the graph prior.
Inferring Identity Factors for Grouped Examples
Christopher Pal
We propose a method for modelling groups of face images from the same identity. The model is trained to infer a distribution over the latent… (voir plus) space for identity given a small set of “training data”. One can then sample images using that latent representation to produce images of the same identity. We demonstrate that the model extracts disentangled factors for identity factors and image-specific vectors. We also perform generative classification over identities to assess its feasibility for few-shot face recognition.