Publications

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng

Remi Tachet des Combes

Byron Boots

We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "… (see more)any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

arxiv.org

Stochastic Neural Network with Kronecker Flow

Chin-Wei Huang

Ahmed Touati

Pascal Vincent

Gintare Karolina Dziugaite

Alexandre Lacoste

Aaron Courville

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (see more) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

arxiv.org

Value Preserving State-Action Abstractions

David Abel

Nathan Umbanhowar

Khimya Khetarpal

Dilip Arumugam

Doina Precup

Michael L. Littman

Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (see more), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.

2020-06-03

International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Value Preserving State-Action Abstractions

David Abel

Nathan Umbanhowar

Khimya Khetarpal

Dilip Arumugam

Doina Precup

Michael L. Littman

Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (see more), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.

2020-06-03

International Conference on Artificial Intelligence and Statistics (published)

dblp.uni-trier.de

Restless bandits: indexability and computation of Whittle index

Nima Akbarzadeh

Aditya Mahajan

Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several altern… (see more)ative processes where the evolution of the process depends on the resource allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach due to its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler to verify refinements of these conditions. We then present a general algorithm to compute Whittle index for indexable restless bandits. Finally, we present a detailed numerical study which affirms the strong performance of the Whittle index heuristic.

2020-06-01

arXiv.org (preprint)

dblp.uni-trier.de

GIANT: Scalable Creation of a Web-scale Ontology

Bang Liu

Weidong Guo

Di Niu

Jinwen Luo

Chaoyue Wang

Zhen Wen

Yu Xu

2020-05-31

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (published)

doi.org

arxiv.org

Current works and future directions on application of machine learning in primary care

Samira Abbasgholizadeh-Rahimi

Vera Granikov

Pierre Pluye

In this short paper, we explained current machine learning works in primary care based on a scoping review that we performed. The performed … (see more)review was in line with the methodological framework proposed by Colquhoun and colleagues. Lastly, we discussed our observations and gave important directions to the future studies in this fast-growing area.

2020-05-27

International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (published)

doi.org

Failure to follow medication changes made at hospital discharge is associated with adverse events in 30 days

Daniala L Weir

Aude Motulsky

Michal Abrahamowicz

Todd C. Lee

Steven Morgan

David Buckeridge

Robyn Tamblyn

2020-05-20

Health Services Research (published)

doi.org

Evaluating White Matter Lesion Segmentations with Refined Sørensen-Dice Analysis

Aaron Carass

Snehashis Roy

Adrian Gherman

Jacob C. Reinhold

Andrew Jesson

Tal Arbel

Oskar Maier

Heinz Handels

Mohsen Ghafoorian

Bram Platel

Ariel Birenbaum

Hayit Greenspan

Dzung L. Pham

Ciprian M. Crainiceanu

Peter A. Calabresi

Jerry L. Prince

William R. Gray Roncal

Russell T. Shinohara

Ipek Oguz

2020-05-19

Scientific Reports (published)

doi.org

An Analysis of the Adaptation Speed of Causal Models

Rémi LE PRIOL

Reza Babanezhad Harikandeh

Yoshua Bengio

Simon Lacoste-Julien

We consider the problem of discovering the causal process that generated a collection of datasets. We assume that all these datasets were ge… (see more)nerated by unknown sparse interventions on a structural causal model (SCM)

2020-05-18

ArXiv (preprint)

arxiv.org

COVI White Paper

Hannah Alsdurf

Yoshua Bengio

Tristan Deleu

Prateek Gupta

Daphne Ippolito

Richard Janda

Max Jarvie

Tyler J. Kolody

Sekoul Krastev

Tegan Maharaj

Robert Obryk

Dan Pilat

Valerie Pisano

Benjamin Prud'homme

Meng Qu

Nasim Rahaman

Irina Rish

Jean-franois Rousseau

abhinav sharma

Brooke Struck … (see 3 more)

Jian Tang

Martin Weiss

Yun William Yu

2020-05-18

ArXiv (preprint)

arxiv.org

Story Forest

Bang Liu

Fred X. Han

Di Niu

Linglong Kong

Kunfeng Lai

Yu Xu

Extracting events accurately from vast news corpora and organize events logically is critical for news apps and search engines, which aim to… (see more) organize news information collected from the Internet and present it to users in the most sensible forms. Intuitively speaking, an event is a group of news documents that report the same news incident possibly in different ways. In this article, we describe our experience of implementing a news content organization system at Tencent to discover events from vast streams of breaking news and to evolve news story structures in an online fashion. Our real-world system faces unique challenges in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we (1) need to accurately and quickly extract distinguishable events from massive streams of long text documents, and (2) must develop the structures of event stories in an online manner, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. A core novelty of our Story Forest system is EventX, a semi-supervised scheme to extract events from massive Internet news corpora. EventX relies on a two-layered, graph-based clustering procedure to group documents into fine-grained events. We conducted extensive evaluations based on (1) 60 GB of real-world Chinese news data, (2) a large Chinese Internet news dataset that contains 11,748 news articles with truth event labels, and (3) the 20 News Groups English dataset, through detailed pilot user experience studies. The results demonstrate the superior capabilities of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers.

2020-05-13

ACM Transactions on Knowledge Discovery from Data (published)

doi.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications