Publications

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor… (voir plus)-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated value function. In this work, we present a new way of off-policy policy evaluation in actor-critic, based on the doubly robust estimators. We extend the doubly robust estimator from off-policy policy evaluation (OPE) to actor-critic algorithms that consist of a reward estimator performance model. We find that doubly robust estimation of the critic can significantly improve performance in continuous control tasks. Furthermore, in cases where the reward function is stochastic that can lead to high variance, doubly robust critic estimation can improve performance under corrupted, stochastic reward signals, indicating its usefulness for robust and safe reinforcement learning.

2019-12-09

arXiv (prépublication)

doi.org

arxiv.org

Interactive Psychometrics for Autism with the Human Dynamic Clamp: Interpersonal Synchrony from Sensory-motor to Socio-cognitive Domains

Florence Baillin

Aline Lefebvre

Amandine Pedoux

Yann Beauxis

Denis-Alexander Engemann

Anna Maruani

Frederique Amsellem

Thomas Bourgeron

Richard Delorme

Guillaume Dumas

2019-12-05

(publié)

doi.org

Mutations associated with neuropsychiatric conditions delineate functional brain connectivity dimensions contributing to autism and schizophrenia

Clara A. Moreau

Sebastian G. W. Urchs

Pierre Orban

Catherine Schramm

Guillaume Dumas

Aurélie Labbe

Guillaume Huguet

Elise Douard

Pierre-Olivier Quirion

Amy Lin

Leila Kushan

Stephanie Grot

David Luck

Adrianna Mendrek

Stephane Potvin

Emmanuel Stip

Thomas Bourgeron

Alan C. Evans

Carrie E. Bearden

Pierre Bellec … (voir 1 de plus)

Sébastien Jacquemont

16p11.2 and 22q11.2 Copy Number Variants (CNVs) confer high risk for Autism Spectrum Disorder (ASD), schizophrenia (SZ), and Attention-Defic… (voir plus)it-Hyperactivity-Disorder (ADHD), but their impact on functional connectivity (FC) remains unclear. Here we report an analysis of resting-state FC using magnetic resonance imaging data from 101 CNV carriers, 755 individuals with idiopathic ASD, SZ, or ADHD and 1,072 controls. We characterize CNV FC-signatures and use them to identify dimensions contributing to complex idiopathic conditions. CNVs have large mirror effects on FC at the global and regional level. Thalamus, somatomotor, and posterior insula regions play a critical role in dysconnectivity shared across deletions, duplications, idiopathic ASD, SZ but not ADHD. Individuals with higher similarity to deletion FC-signatures exhibit worse cognitive and behavioral symptoms. Deletion similarities identified at the connectivity level could be related to the redundant associations observed genome-wide between gene expression spatial patterns and FC-signatures. Results may explain why many CNVs affect a similar range of neuropsychiatric symptoms.

2019-12-05

bioRxiv (prépublication)

doi.org

Applying Knowledge Transfer for Water Body Segmentation in Peru

Jessenia Gonzalez

Debjani Bhowmick

César Beltrán

Kris Sankaran

Yoshua Bengio

2019-12-01

ArXiv (prépublication)

arxiv.org

Detecting GAN generated errors

Xiru Zhu

Fengdi Che

Tianzi Yang

Tzuyang Yu

David Meger

Gregory Dudek

Despite an impressive performance from the latest GAN for generating hyper-realistic images, GAN discriminators have difficulty evaluating t… (voir plus)he quality of an individual generated sample. This is because the task of evaluating the quality of a generated image differs from deciding if an image is real or fake. A generated image could be perfect except in a single area but still be detected as fake. Instead, we propose a novel approach for detecting where errors occur within a generated image. By collaging real images with generated images, we compute for each pixel, whether it belongs to the real distribution or generated distribution. Furthermore, we leverage attention to model long-range dependency; this allows detection of errors which are reasonable locally but not holistically. For evaluation, we show that our error detection can act as a quality metric for an individual image, unlike FID and IS. We leverage Improved Wasserstein, BigGAN, and StyleGAN to show a ranking based on our metric correlates impressively with FID scores. Our work opens the door for better understanding of GAN and the ability to select the best samples from a GAN model.

2019-12-01

ArXiv (prépublication)

arxiv.org

Approximate information state for partially observed systems

Jayakumar Subramanian

Aditya Mahajan

The standard approach for modeling partially observed systems is to model them as partially observable Markov decision processes (POMDPs) an… (voir plus)d obtain a dynamic program in terms of a belief state. The belief state formulation works well for planning but is not ideal for online reinforcement learning because the belief state depends on the model and, as such, is not observable when the model is unknown.In this paper, we present an alternative notion of an information state for obtaining a dynamic program in partially observed models. In particular, an information state is a sufficient statistic for the current reward which evolves in a controlled Markov manner. We show that such an information state leads to a dynamic programming decomposition. Then we present a notion of an approximate information state and present an approximate dynamic program based on the approximate information state. Approximate information state is defined in terms of properties that can be estimated using sampled trajectories. Therefore, they provide a constructive method for reinforcement learning in partially observed systems. We present one such construction and show that it performs better than the state of the art for three benchmark models.

2019-11-30

IEEE Conference on Decision and Control (publié)

doi.org

Artificial Intelligence Based Cloud Distributor (AI-CD): Probing Low Cloud Distribution with Generative Adversarial Neural Networks

T. Yuan

H. Song

David Hall

Victor Schmidt

Kris Sankaran

Yoshua Bengio

2019-11-30

(publié)

www.semanticscholar.org

Automated curriculum generation for Policy Gradients from Demonstrations

Anirudh Srinivasan

Dzmitry Bahdanau

Maxime Chevalier-Boisvert

Yoshua Bengio

2019-11-30

ArXiv (prépublication)

arxiv.org

Forgetting at biologically realistic levels of neurogenesis in a large-scale hippocampal model

Lina M. Tran

Sheena A. Josselyn

Blake A. Richards

Paul W. Frankland

2019-11-30

Behavioural Brain Research (publié)

doi.org

Networked control of coupled subsystems: Spectral decomposition and low-dimensional solutions

Shuang Gao

Aditya Mahajan

In this paper, we investigate optimal networked control of coupled subsystems where the dynamics and the cost couplings depend on an underly… (voir plus)ing weighted graph. We use the spectral decomposition of the graph adjacency matrix to decompose the overall system into (L+1) systems with decoupled dynamics and cost, where L is the rank of the adjacency matrix. Consequently, the optimal control input at each subsystem can be computed by solving (L+1) decoupled Riccati equations. A salient feature of the result is that the solution complexity depends on the rank of the adjacency matrix rather than the size of the network (i.e., the number of nodes). Therefore, the proposed solution framework provides a scalable method for synthesizing and implementing optimal control laws for large-scale systems.

2019-11-30

IEEE Conference on Decision and Control (publié)

doi.org

Restless bandits with controlled restarts: Indexability and computation of Whittle index

Nima Akbarzadeh

Aditya Mahajan

Motivated by applications in machine repair, queueing, surveillance, and clinic care, we consider a scheduling problem where a decision make… (voir plus)r can reset m out of n Markov processes at each time. Processes that are reset, restart according to a known probability distribution and processes that are not reset, evolve in a Markovian manner. Due to the high complexity of finding an optimal policy, such scheduling problems are often modeled as restless bandits. We show that the model satisfies a technical condition known as indexability. For indexable restless bandits, the Whittle index policy, which computes a function known as Whittle index for each process and resets the m processes with the lowest index, is known to be a good heuristic. The Whittle index is computed by solving an auxiliary Markov decision problem for each arm. When the optimal policy for this auxiliary problem is threshold based, we use ideas from renewal theory to derive closed form expression for the Whittle index. We present detailed numerical experiments which suggest that Whittle index policy performs close to the optimal policy and performs significantly better than myopic policy, which is a commonly used heuristic.

2019-11-30

IEEE Conference on Decision and Control (publié)

doi.org

Deconstructing and reconstructing word embedding algorithms

Edward Daniel Newell

Kian Kenyon-Dean

Jackie CK Cheung

Uncontextualized word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applicati… (voir plus)ons. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the necessary and sufficient conditions required for making performant word embeddings. We find that each algorithm: (1) fits vector-covector dot products to approximate pointwise mutual information (PMI); and, (2) modulates the loss gradient to balance weak and strong signals. We demonstrate that these two algorithmic features are sufficient conditions to construct a novel word embedding algorithm, Hilbert-MLE. We find that its embeddings obtain equivalent or better performance against other algorithms across 17 intrinsic and extrinsic datasets.

2019-11-28

ArXiv (prépublication)

arxiv.org

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Publications

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Mots-clés populaires:

Publications