Publications

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng

Remi Tachet des Combes

Byron Boots

Geoff Gordon

We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "… (voir plus)any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

arxiv.org

Stochastic Neural Network with Kronecker Flow

Chin-wei Huang

Ahmed Touati

Pascal Vincent

Gintare Karolina Dziugaite

Alexandre Lacoste

Aaron Courville

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (voir plus) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

arxiv.org

The Neurobiology of Social Distance

Danilo Bzdok

Robin I. M. Dunbar

2020-06-03

Trends in Cognitive Sciences (publié)

doi.org

Value Preserving State-Action Abstractions

David Abel

Nathan Umbanhowar

Khimya Khetarpal

Dilip Arumugam

Doina Precup

Michael L. Littman

Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (voir plus), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.

2020-06-03

International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Value Preserving State-Action Abstractions

David Abel

Nathan Umbanhowar

Khimya Khetarpal

Dilip Arumugam

Doina Precup

Michael L. Littman

Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (voir plus), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.

2020-06-03

International Conference on Artificial Intelligence and Statistics (published)

dblp.uni-trier.de

The need for privacy with public digital contact tracing during the COVID-19 pandemic

Yoshua Bengio

Richard Janda

Yun William Yu

Daphne Ippolito

Max Jarvie

Dan Pilat

Brooke Struck

Sekoul Krastev

Abhinav Sharma

2020-06-02

The Lancet Digital Health (publié)

doi.org

Training End-to-End Analog Neural Networks with Equilibrium Propagation

Jack D. Kendall

Ross D. Pantone

Kalpana Manickavasagam

Yoshua Bengio

Benjamin Scellier

We introduce a principled method to train end-to-end analog neural networks by stochastic gradient descent. In these analog neural networks,… (voir plus) the weights to be adjusted are implemented by the conductances of programmable resistive devices such as memristors [Chua, 1971], and the nonlinear transfer functions (or `activation functions') are implemented by nonlinear components such as diodes. We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models: they possess an energy function as a consequence of Kirchhoff's laws governing electrical circuits. This property enables us to train them using the Equilibrium Propagation framework [Scellier and Bengio, 2017]. Our update rule for each conductance, which is local and relies solely on the voltage drop across the corresponding resistor, is shown to compute the gradient of the loss function. Our numerical simulations, which use the SPICE-based Spectre simulation framework to simulate the dynamics of electrical circuits, demonstrate training on the MNIST classification task, performing comparably or better than equivalent-size software-based neural networks. Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.

2020-06-02

ArXiv (prépublication)

arxiv.org

Multi-Image Super-Resolution for Remote Sensing using Deep Recurrent Networks

Md Rifat Arefin

Vincent Michalski

Pierre-Luc St-Charles

Alfredo Kalaitzis

Sookyung Kim

Samira Ebrahimi Kahou

Yoshua Bengio

High-resolution satellite imagery is critical for various earth observation applications related to environment monitoring, geoscience, fore… (voir plus)casting, and land use analysis. However, the acquisition cost of such high-quality imagery due to the scarcity of providers and needs for high-frequency revisits restricts its accessibility in many fields. In this work, we present a data-driven, multi-image super resolution approach to alleviate these problems. Our approach is based on an end-to-end deep neural network that consists of an encoder, a fusion module, and a decoder. The encoder extracts co-registered highly efficient feature representations from low-resolution images of a scene. A Gated Re-current Unit (GRU)-based module acts as the fusion module, aggregating features into a combined representation. Finally, a decoder reconstructs the super-resolved image. The proposed model is evaluated on the PROBA-V dataset released in a recent competition held by the European Space Agency. Our results show that it performs among the top contenders and offers a new practical solution for real-world applications.

2020-06-01

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (publié)

doi.org

Population variability in social brain morphology for social support, household size and friendship satisfaction

Arezoo Taebi

Hannah Kiesow

Kai Vogeley

Leonhard Schilbach

Boris C Bernhardt

Danilo Bzdok

2020-06-01

Social Cognitive and Affective Neuroscience (publié)

doi.org

Restless bandits: indexability and computation of Whittle index

Nima Akbarzadeh

Aditya Mahajan

Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several altern… (voir plus)ative processes where the evolution of the process depends on the resource allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach due to its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler to verify refinements of these conditions. We then present a general algorithm to compute Whittle index for indexable restless bandits. Finally, we present a detailed numerical study which affirms the strong performance of the Whittle index heuristic.

2020-06-01

arXiv.org (prépublication)

dblp.uni-trier.de

GIANT: Scalable Creation of a Web-scale Ontology

Bang Liu

Weidong Guo

Di Niu

Jinwen Luo

Chaoyue Wang

Zhen Wen

Yu Xu

2020-05-31

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (publié)

doi.org

arxiv.org

Current works and future directions on application of machine learning in primary care

Samira Abbasgholizadeh-Rahimi

Vera Granikov

Pierre Pluye

In this short paper, we explained current machine learning works in primary care based on a scoping review that we performed. The performed … (voir plus)review was in line with the methodological framework proposed by Colquhoun and colleagues. Lastly, we discussed our observations and gave important directions to the future studies in this fast-growing area.

2020-05-27

International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (publié)

doi.org

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Publications

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Mots-clés populaires:

Publications