Publications

G RADIENT -B ASED N EURAL DAG L EARNING WITH I NTERVENTIONS
Philippe Brouillard
Sébastien Lachapelle
Alexandre Lacoste
Decision making based on statistical association alone can be a dangerous endeavor due to non-causal associations. Ideally, one would rely o… (see more)n causal relationships that enable reasoning about the effect of interventions. Several methods have been proposed to discover such relationships from observational and inter-ventional data. Among them, GraN-DAG, a method that relies on the constrained optimization of neural networks, was shown to produce state-of-the-art results among algorithms relying purely on observational data. However, it is limited to observational data and cannot make use of interventions. In this work, we extend GraN-DAG to support interventional data and show that this improves its ability to infer causal structures
In Search of Robust Measures of Generalization
Brady Neal
Nitarshan Rajkumar
Ethan Caballero
Linbo Wang
Daniel M. Roy
One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now tra… (see more)ins networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
SENET: A Semantic Web for Supporting Automation of Software Engineering Tasks
Yalin Liu
Jinfeng Lin
Jane Cleland-Huang
Michael Vierhauser
Sugandha Lohar
The use of Natural Language (NL) interfaces to allow devices and applications to respond to verbal commands or free-form textual queries is … (see more)becoming increasingly prevalent in our society. To a large extent, their success in interpreting and responding to a request is dependent upon rich underlying ontologies and conceptual models that understand the technical or domain specific vocabulary of diverse users. The effective use of NL interfaces in the Software Engineering (SE) domains requires its own ontology models focusing upon software related terms and concepts. While many SE glossaries exist, they are often incomplete and tend to define the vocabulary for specific sub-fields without capturing associations between terms and phrases. This limits their usefulness for supporting NL-related tasks. In this paper we propose an approach for constructing and evolving a semantic network of software engineering concepts and phrases. Our approach starts with a set of existing SE glossaries, uses the existing glossary terms and explicitly defined associations as a starting point, uses machine learning-based techniques to dynamically identify and document additional associations between terms, leverages the network to interpret NL queries in the SE domain, and finally augments the resulting semantic network with feedback provided by users. We evaluate the viability of our approach within the sub-domain of Agile Software Development, focusing on requirements related queries, and show that the semantic network enhances the ability of an NL interface to correctly interpret and execute user queries.
Spike-based causal inference for weight alignment
Jordan Guerguiev
Konrad Paul Kording
In artificial neural networks trained with gradient descent, the weights used for processing stimuli are also used during backward passes to… (see more) calculate gradients. For the real brain to approximate gradients, gradient information would have to be propagated separately, such that one set of synaptic weights is used for processing and another set is used for backward passes. This produces the so-called "weight transport problem" for biological models of learning, where the backward weights used to calculate gradients need to mirror the forward weights used to process stimuli. This weight transport problem has been considered so hard that popular proposals for biological learning assume that the backward weights are simply random, as in the feedback alignment algorithm. However, such random weights do not appear to work well for large networks. Here we show how the discontinuity introduced in a spiking system can lead to a solution to this problem. The resulting algorithm is a special case of an estimator used for causal inference in econometrics, regression discontinuity design. We show empirically that this algorithm rapidly makes the backward weights approximate the forward weights. As the backward weights become correct, this improves learning performance over feedback alignment on tasks such as Fashion-MNIST, SVHN, CIFAR-10 and VOC. Our results demonstrate that a simple learning rule in a spiking network can allow neurons to produce the right backward connections and thus solve the weight transport problem.
Stochastic Hamiltonian Gradient Methods for Smooth Games
Nicolas Loizou
Hugo Berard
Alexia Jolicoeur-Martineau
The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the c… (see more)lass of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.
A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry
Baihan Lin
Guillermo Cecchi
Djallel Bouneffouf
Jenna Reinen
Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework fo… (see more)r reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems, as well as various neuropsychiatric conditions associated with disruptions in normal reward processing. From the computational perspective, we observe that the proposed Split-QL model and its clinically inspired variants consistently outperform standard Q-Learning and SARSA methods, as well as recently proposed Double Q-Learning approaches, on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the Pac-Man game in a lifelong learning setting across different reward stationarities.
Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models
Christian Dietrich Weilbach
Boyan Beronov
Frank Wood
William Harvey
We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing flows (CNFs) for amortized i… (see more)nference. We find that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the flow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained flows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.
Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models
Christian Dietrich Weilbach
Boyan Beronov
Frank Wood
William Harvey
We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing flows (CNFs) for amortized i… (see more)nference. We find that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the flow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained flows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.
Synbols: Probing Learning Algorithms with Synthetic Datasets
Alexandre Lacoste
Pau Rodr'iguez
Frédéric Branchaud-charron
Parmida Atighehchian
Massimo Caccia
Issam Hadj Laradji
Matt P. Craddock
David Vazquez
Tensorized Random Projections
Beheshteh T. Rakhshan
On the Effectiveness of Two-Step Learning for Latent-Variable Models
Latent-variable generative models offer a principled solution for modeling and sampling from complex probability distributions. Implementing… (see more) a joint training objective with a complex prior, however, can be a tedious task, as one is typically required to derive and code a specific cost function for each new type of prior distribution. In this work, we propose a general framework for learning latent variable generative models in a two-step fashion. In the first step of the framework, we train an autoencoder, and in the second step we fit a prior model on the resulting latent distribution. This two-step approach offers a convenient alternative to joint training, as it allows for a straightforward combination of existing models without the hustle of deriving new cost functions, and the need for coding the joint training objectives. Through a set of experiments, we demonstrate that two-step learning results in performances similar to joint training, and in some cases even results in more accurate modeling.
On the interplay between noise and curvature and its effect on optimization and generalization
Valentin Thomas
Fabian Pedregosa
Bart van Merriënboer
Pierre-Antoine Manzagol
The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (see more)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.