Publications

G RADIENT -B ASED N EURAL DAG L EARNING WITH I NTERVENTIONS

Philippe Brouillard

Sébastien Lachapelle

Alexandre Lacoste

Decision making based on statistical association alone can be a dangerous endeavor due to non-causal associations. Ideally, one would rely o… (voir plus)n causal relationships that enable reasoning about the effect of interventions. Several methods have been proposed to discover such relationships from observational and inter-ventional data. Among them, GraN-DAG, a method that relies on the constrained optimization of neural networks, was shown to produce state-of-the-art results among algorithms relying purely on observational data. However, it is limited to observational data and cannot make use of interventions. In this work, we extend GraN-DAG to support interventional data and show that this improves its ability to infer causal structures

2020-01-01

(publié)

www.semanticscholar.org

In Search of Robust Measures of Generalization

Gintare Karolina Dziugaite

Alexandre Drouin

Brady Neal

Nitarshan Rajkumar

Ethan Caballero

Linbo Wang

Ioannis Mitliagkas

Daniel M. Roy

One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now tra… (voir plus)ins networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.

arxiv.org

SENET: A Semantic Web for Supporting Automation of Software Engineering Tasks

Yalin Liu

Jinfeng Lin

Jane Cleland-Huang

Michael Vierhauser

Jin Guo

Sugandha Lohar

The use of Natural Language (NL) interfaces to allow devices and applications to respond to verbal commands or free-form textual queries is … (voir plus)becoming increasingly prevalent in our society. To a large extent, their success in interpreting and responding to a request is dependent upon rich underlying ontologies and conceptual models that understand the technical or domain specific vocabulary of diverse users. The effective use of NL interfaces in the Software Engineering (SE) domains requires its own ontology models focusing upon software related terms and concepts. While many SE glossaries exist, they are often incomplete and tend to define the vocabulary for specific sub-fields without capturing associations between terms and phrases. This limits their usefulness for supporting NL-related tasks. In this paper we propose an approach for constructing and evolving a semantic network of software engineering concepts and phrases. Our approach starts with a set of existing SE glossaries, uses the existing glossary terms and explicitly defined associations as a starting point, uses machine learning-based techniques to dynamically identify and document additional associations between terms, leverages the network to interpret NL queries in the SE domain, and finally augments the resulting semantic network with feedback provided by users. We evaluate the viability of our approach within the sub-domain of Agile Software Development, focusing on requirements related queries, and show that the semantic network enhances the ability of an NL interface to correctly interpret and execute user queries.

2020-01-01

2020 IEEE Seventh International Workshop on Artificial Intelligence for Requirements Engineering (AIRE) (publié)

doi.org

Spike-based causal inference for weight alignment

Jordan Guerguiev

Konrad Paul Kording

Blake Richards

In artificial neural networks trained with gradient descent, the weights used for processing stimuli are also used during backward passes to… (voir plus) calculate gradients. For the real brain to approximate gradients, gradient information would have to be propagated separately, such that one set of synaptic weights is used for processing and another set is used for backward passes. This produces the so-called "weight transport problem" for biological models of learning, where the backward weights used to calculate gradients need to mirror the forward weights used to process stimuli. This weight transport problem has been considered so hard that popular proposals for biological learning assume that the backward weights are simply random, as in the feedback alignment algorithm. However, such random weights do not appear to work well for large networks. Here we show how the discontinuity introduced in a spiking system can lead to a solution to this problem. The resulting algorithm is a special case of an estimator used for causal inference in econometrics, regression discontinuity design. We show empirically that this algorithm rapidly makes the backward weights approximate the forward weights. As the backward weights become correct, this improves learning performance over feedback alignment on tasks such as Fashion-MNIST, SVHN, CIFAR-10 and VOC. Our results demonstrate that a simple learning rule in a spiking network can allow neurons to produce the right backward connections and thus solve the weight transport problem.

2020-01-01

ICLR.cc/2020/Conference (poster)

openreview.net

Stochastic Hamiltonian Gradient Methods for Smooth Games

Nicolas Loizou

Hugo Berard

Alexia Jolicoeur-Martineau

Pascal Vincent

Simon Lacoste-Julien

Ioannis Mitliagkas

The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the c… (voir plus)lass of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.

2020-01-01

ICML (publié)

proceedings.mlr.press

arxiv.org

A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry

Baihan Lin

Guillermo Cecchi

Djallel Bouneffouf

Jenna Reinen

Irina Rish

Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework fo… (voir plus)r reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases -- an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems, as well as various neuropsychiatric conditions associated with disruptions in normal reward processing. From the computational perspective, we observe that the proposed Split-QL model and its clinically inspired variants consistently outperform standard Q-Learning and SARSA methods, as well as recently proposed Double Q-Learning approaches, on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the Pac-Man game in a lifelong learning setting across different reward stationarities.

2020-01-01

AAMAS (publié)

arxiv.org

Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models

Christian Dietrich Weilbach

Boyan Beronov

Frank Wood

William Harvey

We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing ﬂows (CNFs) for amortized i… (voir plus)nference. We ﬁnd that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the ﬂow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained ﬂows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.

2020-01-01

International Conference on Artificial Intelligence and Statistics (published)

dblp.uni-trier.de

Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models

Christian Dietrich Weilbach

Boyan Beronov

Frank Wood

William Harvey

We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing ﬂows (CNFs) for amortized i… (voir plus)nference. We ﬁnd that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the ﬂow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained ﬂows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.

2020-01-01

International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Synbols: Probing Learning Algorithms with Synthetic Datasets

Alexandre Lacoste

Pau Rodr'iguez

Frédéric Branchaud-charron

Parmida Atighehchian

Massimo Caccia

Issam Hadj Laradji

Alexandre Drouin

Matt P. Craddock

Laurent Charlin

David Vazquez

arxiv.org

Tensorized Random Projections

Beheshteh T. Rakhshan

Guillaume Rabusseau

2020-01-01

AISTATS (publié)

proceedings.mlr.press

arxiv.org

On the Effectiveness of Two-Step Learning for Latent-Variable Models

Cem Subakan

Maxime Gasse

Laurent Charlin

Latent-variable generative models offer a principled solution for modeling and sampling from complex probability distributions. Implementing… (voir plus) a joint training objective with a complex prior, however, can be a tedious task, as one is typically required to derive and code a specific cost function for each new type of prior distribution. In this work, we propose a general framework for learning latent variable generative models in a two-step fashion. In the first step of the framework, we train an autoencoder, and in the second step we fit a prior model on the resulting latent distribution. This two-step approach offers a convenient alternative to joint training, as it allows for a straightforward combination of existing models without the hustle of deriving new cost functions, and the need for coding the joint training objectives. Through a set of experiments, we demonstrate that two-step learning results in performances similar to joint training, and in some cases even results in more accurate modeling.

2020-01-01

2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (publié)

doi.org

On the interplay between noise and curvature and its effect on optimization and generalization

Valentin Thomas

Fabian Pedregosa

Bart van Merriënboer

Pierre-Antoine Manzagol

Yoshua Bengio

Nicolas Le Roux

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (voir plus)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.

2020-01-01

AISTATS (publié)

proceedings.mlr.press

arxiv.org

À la hauteur du moment

École d'été sur l'IA responsable et les droits humains

TRAIL pour les professionnels

Publications

À la hauteur du moment

École d'été sur l'IA responsable et les droits humains

TRAIL pour les professionnels

Mots-clés populaires:

Publications