Publications

Reinforcement Learning in Stationary Mean-field Games

Jayakumar Subramanian

Multi-agent reinforcement learning has made significant progress in recent years, but it remains a hard problem. Hence, one often resorts to… (voir plus) developing learning algorithms for specific classes of multi-agent systems. In this paper we study reinforcement learning in a specific class of multi-agent systems systems called mean-field games. In particular, we consider learning in stationary mean-field games. We identify two different solution concepts---stationary mean-field equilibrium and stationary mean-field social-welfare optimal policy---for such games based on whether the agents are non-cooperative or cooperative, respectively. We then generalize these solution concepts to their local variants using bounded rationality based arguments. For these two local solution concepts, we present two reinforcement learning algorithms. We show that the algorithms converge to the right solution under mild technical conditions and demonstrate this using two numerical examples.

2019-02-28

Adaptive Agents and Multi-Agent Systems (publié)

doi.org

Stochastic Bit-Wise Iterative Decoding of Polar Codes

Kaining Han

Junchao Wang

Warren J. Gross

Jianhao Hu

Polar codes have received recent attention due to their potential to be applied in advanced wireless communication protocols such as the fif… (voir plus)th generation mobile communication system (5G). Among the existing decoding algorithms, Belief Propagation (BP) exhibits high-throughput, low-latency, and soft output with a high hardware cost. Stochastic computing, as a form of approximate computing, provides a potential low-cost implementation solution for the BP algorithm. However, existing stochastic BP decoders suffer from a relatively long decoding latency resulting in low hardware efficiency. In this paper, a novel bit-wise iterative stochastic decoding architecture for the BP algorithm is proposed to improve the throughput and hardware efficiency. By utilizing the frozen bits of polar codes and stochastic computing, multiple novel optimization methods are presented to further speed up convergence and increase the hardware efficiency.

2019-02-28

IEEE Transactions on Signal Processing (publié)

doi.org

Prediction of Progression in Multiple Sclerosis Patients

Adrian Tousignant

Paul Lemaitre

Doina Precup

Douglas Arnold

Tal Arbel

We present the first automatic end-to-end deep learning framework for the prediction of future patient disability progression (one year from… (voir plus) baseline) based on multi-modal brain Magnetic Resonance Images (MRI) of patients with Multiple Sclerosis (MS). The model uses parallel convolutional pathways, an idea introduced by the popular Inception net and is trained and tested on two large proprietary, multi-scanner, multi-center, clinical trial datasets of patients with Relapsing-Remitting Multiple Sclerosis (RRMS). Experiments on 465 patients on the placebo arms of the trials indicate that the model can accurately predict future disease progression, measured by a sustained increase in the extended disability status scale (EDSS) score over time. Using only the multi-modal MRI provided at baseline, the model achieves an AUC of 0.66 +- 0.055. However, when supplemental lesion label masks are provided as inputs as well, the AUC increases to 0.701 +- 0.027. Furthermore, we demonstrate that uncertainty estimates based on Monte Carlo dropout sample variance correlate with errors made by the model. Clinicians provided with the predictions computed by the model can therefore use the associated uncertainty estimates to assess which scans require further examination.

2019-02-27

MIDL.io/2019/Conference (accepté)

openreview.net

The Termination Critic

Anna Harutyunyan

Will Dabney

Diana Borsa

Nicolas Heess

Remi Munos

Doina Precup

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We… (voir plus) propose an algorithm that focuses on the termination function, as opposed to - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a "critic" for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.

2019-02-25

ArXiv (prépublication)

proceedings.mlr.press

The Termination Critic

Anna Harutyunyan

Will Dabney

Diana Borsa

Nicolas Heess

Remi Munos

Doina Precup

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We… (voir plus) propose an algorithm that focuses on the termination function, as opposed to - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions.To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a “critic” for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.

2019-02-25

ArXiv (prépublication)

arxiv.org

Hyperbolic Discounting and Learning over Multiple Horizons

William Fedus

Carles Gelada

Yoshua Bengio

Bellemare Marc-Emmanuel

Hugo Larochelle

Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future re… (voir plus)wards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have hyperbolic time-preferences. In this work we revisit the fundamentals of discounting in RL and bridge this disconnect by implementing an RL agent that acts via hyperbolic discounting. We demonstrate that a simple approach approximates hyperbolic discount functions while still using familiar temporal-difference learning techniques in RL. Additionally, and independent of hyperbolic discounting, we make a surprising discovery that simultaneously learning value functions over multiple time-horizons is an effective auxiliary task which often improves over a strong value-based RL agent, Rainbow.

2019-02-18

ArXiv (prépublication)

openreview.net

Predicting conversion to psychosis in clinical high risk patients using resting-state functional MRI features

Jolie Mcdonnell

W. Hord

Jenna Reinen

Pablo Polosecki

Irina Rish

Guillermo Cecchi

Recent progress in artificial intelligence provides researchers with a powerful set of machine learning tools for analyzing brain imaging da… (voir plus)ta. In this work, we explore a variety of classification algorithms and functional network features derived from resting-state fMRI data collected from clinical high-risk (prodromal schizophrenia) patients and controls, trying to identify features predictive of conversion to psychosis among a subset of CHR patients. While there are many existing studies suggesting that functional network features can be highly discriminative of schizophrenia when analyzing fMRI of patients suffering from the disease vs controls, few studies attempt to explore a similar approach to actual prediction of future psychosis development ahead of time, in the prodromal stage. Our preliminary results demonstrate the potential of fMRI functional network features to predict the conversion to psychosis in CHR patients. However, given the high variance of our results across different classifiers and subsets of data, a more extensive empirical investigation is required to reach more robust conclusions.

2019-02-15

Medical Imaging 2019: Biomedical Applications in Molecular, Structural, and Functional Imaging (publié)

doi.org

Anytime Tail Averaging

Nicolas Roux

Tail averaging consists in averaging the last examples in a stream. Common techniques either have a memory requirement which grows with the … (voir plus)number of samples to average, are not available at every timestep or do not accomodate growing windows. We propose two techniques with a low constant memory cost that perform tail averaging with access to the average at every time step. We also show how one can improve the accuracy of that average at the cost of increased memory consumption.

2019-02-12

ArXiv (prépublication)

arxiv.org

Dendritic solutions to the credit assignment problem

Blake Aaron Richards

Timothy P Lillicrap

2019-01-31

Current Opinion in Neurobiology (publié)

doi.org

Equivalence of Equilibrium Propagation and Recurrent Backpropagation

Benjamin Scellier

Yoshua Bengio

Recurrent backpropagation and equilibrium propagation are supervised learning algorithms for fixed-point recurrent neural networks, which di… (voir plus)ffer in their second phase. In the first phase, both algorithms converge to a fixed point that corresponds to the configuration where the prediction is made. In the second phase, equilibrium propagation relaxes to another nearby fixed point corresponding to smaller prediction error, whereas recurrent backpropagation uses a side network to compute error derivatives iteratively. In this work, we establish a close connection between these two algorithms. We show that at every moment in the second phase, the temporal derivatives of the neural activities in equilibrium propagation are equal to the error derivatives computed iteratively by recurrent backpropagation in the side network. This work shows that it is not required to have a side network for the computation of error derivatives and supports the hypothesis that in biological neural networks, temporal derivatives of neural activities may code for error signals.

2019-01-31

Neural Computation (publié)

doi.org

openreview.net

The Impact of Time Interval between Extubation and Reintubation on Death or Bronchopulmonary Dysplasia in Extremely Preterm Infants

Wissam Shalish

Lara Kanbar

Lajos Kovacs

Sanjay Chawla

Martin Keszler

Smita Rao

Bogdan Panaitescu

Alyse Laliberte

Doina Precup

Karen Brown

Robert E. Kearney

Guilherme M. Sant'Anna

2019-01-31

The Journal of Pediatrics (publié)

doi.org

Author Correction: Why rankings of biomedical image analysis competitions should be interpreted with care

Lena Maier-Hein

Matthias Eisenmann

Annika Reinke

Sinan Onogur

Marko Stankovic

Patrick Scholz

Tal Arbel

Hrvoje Bogunovic

Andrew P. Bradley

Aaron Carass

Carolin Feldmann

Alejandro F. Frangi

Peter M. Full

Bram van Ginneken

Allan Hanbury

Katrin Honauer

Michal Kozubek

Bennett Landman

Keno März

Oskar Maier … (voir 18 de plus)

Klaus Maier-Hein

Bjoern Menze

Henning Müller

Peter F. Neher

Wiro Niessen

Nasir Rajpoot

Gregory C. Sharp

Korsuk Sirinukunwattana

Stefanie Speidel

Christian Stock

Danail Stoyanov

Abdel Aziz Taha

Fons van der Sommen

Ching-Wei Wang

Marc-André Weber

Guoyan Zheng

Pierre Jannin

Annette Kopp-Schneider

2019-01-29

Nature Communications (publié)

doi.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Publications