On Catastrophic Interference in Atari 2600 Games
William Fedus
Dibya Ghosh
John D. Martin
Model-free deep reinforcement learning is sample inefficient. One hypothesis -- speculated, but not confirmed -- is that catastrophic interf… (voir plus)erence within an environment inhibits learning. We test this hypothesis through a large-scale empirical study in the Arcade Learning Environment (ALE) and, indeed, find supporting evidence. We show that interference causes performance to plateau; the network cannot train on segments beyond the plateau without degrading the policy used to reach there. By synthetically controlling for interference, we demonstrate performance boosts across architectures, learning algorithms and environments. A more refined analysis shows that learning one segment of a game often increases prediction errors elsewhere. Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.
Machine learning analysis of exome trios to contrast the genomic architecture of autism and schizophrenia
Sameer Sardaar
Bill Qi
Alexandre Dionne-Laporte
Guy. A. Rouleau
Solving ODE with Universal Flows: Approximation Theory for Flow-Based Models
Chin-Wei Huang
Laurent Dinh
Normalizing flows are powerful invertible probabilistic models that can be used to translate two probability distributions, in a way that al… (voir plus)lows us to efficiently track the change of probability density. However, to trade for computational efficiency in sampling and in evaluating the log-density, special parameterization designs have been proposed at the cost of representational expressiveness. In this work, we propose to use ODEs as a framework to establish universal approximation theory for certain families of flow-based models.
Analysing brain networks in population neuroscience: a case for the Bayesian philosophy
Dorothea L. Floris
Andre Marquand
Network connectivity fingerprints are among today's best choices to obtain a faithful sampling of an individual's brain and cognition. Widel… (voir plus)y available MRI scanners can provide rich information tapping into network recruitment and reconfiguration that now scales to hundreds and thousands of humans. Here, we contemplate the advantages of analysing such connectome profiles using Bayesian strategies. These analysis techniques afford full probability estimates of the studied network coupling phenomena, provide analytical machinery to separate epistemological uncertainty and biological variability in a coherent manner, usher us towards avenues to go beyond binary statements on existence versus non-existence of an effect, and afford credibility estimates around all model parameters at play which thus enable single-subject predictions with rigorous uncertainty intervals. We illustrate the brittle boundary between healthy and diseased brain circuits by autism spectrum disorder as a recurring theme where, we argue, network-based approaches in neuroscience will require careful probabilistic answers. This article is part of the theme issue ‘Unifying the essential concepts of biological networks: biological insights and philosophical foundations’.
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
Nicolas Loizou
Sharan Vaswani
Issam Hadj Laradji
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing… (voir plus) the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.
Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning
Devansh Arpit
Huan Wang
Caiming Xiong
Richard Socher
The Geometry of Sign Gradient Descent
Lukas Balles
Fabian Pedregosa
Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models
Chin-Wei Huang
Laurent Dinh
In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drasticall… (voir plus)y increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate state-of-the-art performance on standard benchmarks of flow-based generative modeling.
HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery
Michel Deudon
Alfredo Kalaitzis
Israel Goytom
Md Rifat Arefin
Zhichao Lin
Kris Sankaran
Vincent Michalski
Julien Cornebise
Generative deep learning has sparked a new wave of Super-Resolution (SR) algorithms that enhance single images with impressive aesthetic res… (voir plus)ults, albeit with imaginary details. Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views. This is important for satellite monitoring of human impact on the planet -- from deforestation, to human rights violations -- that depend on reliable imagery. To this end, we present HighRes-net, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss. Co-registration of low-resolution views is learned implicitly through a reference-frame channel, with no explicit registration mechanism. We learn a global fusion operator that is applied recursively on an arbitrary number of low-resolution pairs. We introduce a registered loss, by learning to align the SR output to a ground-truth through ShiftNet. We show that by learning deep representations of multiple views, we can super-resolve low-resolution signals and enhance Earth Observation data at scale. Our approach recently topped the European Space Agency's MFSR competition on real-world satellite imagery.
Minimax Theorem for Latent Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets
D. Balduzzi
Wojciech M. Czarnecki
M. Garnelo
Yoram Bachrach
Adversarial training, a special case of multi-objective optimization, is an increasingly useful tool in machine learning. For example, two-p… (voir plus)layer zero-sum games are important for generative modeling (GANs) and for mastering games like Go or Poker via self-play. A classic result in Game Theory states that one must mix strategies, as pure equilibria may not exist. Surprisingly, machine learning practitioners typically train a \emph{single} pair of agents -- instead of a pair of mixtures -- going against Nash's principle. Our main contribution is a notion of limited-capacity-equilibrium for which, as capacity grows, optimal agents -- not mixtures -- can learn increasingly expressive and realistic behaviors. We define \emph{latent games}, a new class of game where agents are mappings that transform latent distributions. Examples include generators in GANs, which transform Gaussian noise into distributions on images, and StarCraft II agents, which transform sampled build orders into policies. We show that minimax equilibria in latent games can be approximated by a \emph{single} pair of dense neural networks. Finally, we apply our latent game approach to solve differentiable Blotto, a game with an infinite strategy space.
Minimax Theorem for Latent Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets
D. Balduzzi
Wojciech M. Czarnecki
M. Garnelo
Yoram Bachrach
Adversarial training, a special case of multi-objective optimization, is an increasingly useful tool in machine learning. For example, two-p… (voir plus)layer zero-sum games are important for generative modeling (GANs) and for mastering games like Go or Poker via self-play. A classic result in Game Theory states that one must mix strategies, as pure equilibria may not exist. Surprisingly, machine learning practitioners typically train a \emph{single} pair of agents -- instead of a pair of mixtures -- going against Nash's principle. Our main contribution is a notion of limited-capacity-equilibrium for which, as capacity grows, optimal agents -- not mixtures -- can learn increasingly expressive and realistic behaviors. We define \emph{latent games}, a new class of game where agents are mappings that transform latent distributions. Examples include generators in GANs, which transform Gaussian noise into distributions on images, and StarCraft II agents, which transform sampled build orders into policies. We show that minimax equilibria in latent games can be approximated by a \emph{single} pair of dense neural networks. Finally, we apply our latent game approach to solve differentiable Blotto, a game with an infinite strategy space.
Saliency Enhancement using Gradient Domain Edges Merging
Sofiane Wozniak Achiche
Alexandre Duperre
Maxime Raison
In recent years, there has been a rapid progress in solving the binary problems in computer vision, such as edge detection which finds the b… (voir plus)oundaries of an image and salient object detection which finds the important object in an image. This progress happened thanks to the rise of deep-learning and convolutional neural networks (CNN) which allow to extract complex and abstract features. However, edge detection and saliency are still two different fields and do not interact together, although it is intuitive for a human to detect salient objects based on its boundaries. Those features are not well merged in a CNN because edges and surfaces do not intersect since one feature represents a region while the other represents boundaries between different regions. In the current work, the main objective is to develop a method to merge the edges with the saliency maps to improve the performance of the saliency. Hence, we developed the gradient-domain merging (GDM) which can be used to quickly combine the image-domain information of salient object detection with the gradient-domain information of the edge detection. This leads to our proposed saliency enhancement using edges (SEE) with an average improvement of the F-measure of at least 3.4 times higher on the DUT-OMRON dataset and 6.6 times higher on the ECSSD dataset, when compared to competing algorithm such as denseCRF and BGOF. The SEE algorithm is split into 2 parts, SEE-Pre for preprocessing and SEE-Post pour postprocessing.