Publications

Survey on Applications of Multi-Armed and Contextual Bandits

Djallel Bouneffouf

Charu Aggarwal

In recent years, the multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems a… (see more)nd information retrieval to healthcare and finance. This success is due to its stellar performance combined with attractive properties, such as learning from less feedback. The multiarmed bandit field is currently experiencing a renaissance, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize the state-of-the-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this burgeoning field.

2020-07-19

2020 IEEE Congress on Evolutionary Computation (CEC) (published)

doi.org

Molecular signatures of cognition and affect

Justine Y. Hansen

Ross D Markello

Jacob W. Vogel

Jakob Seidlitz

Danilo Bzdok

Bratislav Mišić

2020-07-16

bioRxiv (preprint)

doi.org

Extendable and invertible manifold learning with geometry regularized autoencoders

Andres F. Duque Correa

Sacha Morin

Guy Wolf

Kevin R. Moon

A fundamental task in data exploration is to extract simplified low dimensional representations that capture intrinsic geometry in data, esp… (see more)ecially for faithfully visualizing data in two or three dimensions. Common approaches to this task use kernel methods for manifold learning. However, these methods typically only provide an embedding of fixed input data and cannot extend to new data points. Autoencoders have also recently become popular for representation learning. But while they naturally compute feature extractors that are both extendable to new data and invertible (i.e., reconstructing original features from latent representation), they have limited capabilities to follow global intrinsic geometry compared to kernel-based manifold learning. We present a new method for integrating both approaches by incorporating a geometric regularization term in the bottleneck of the autoencoder. Our regularization, based on the diffusion potential distances from the recently-proposed PHATE visualization method, encourages the learned latent representation to follow intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension to new data and reconstruction of data in the original feature space from latent coordinates. We compare our approach with leading kernel methods and autoencoder models for manifold learning to provide qualitative and quantitative evidence of our advantages in preserving intrinsic structure, out of sample extension, and reconstruction. Our method is easily implemented for big-data applications, whereas other methods are limited in this regard.

2020-07-14

ArXiv (preprint)

doi.org

arxiv.org

Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP

Amy Zhang

Shagun Sodhani

Khimya Khetarpal

Joelle Pineau

Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (see more)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.

2020-07-14

ArXiv (preprint)

arxiv.org

A Brief Look at Generalization in Visual Meta-Reinforcement Learning

Safa Alver

Doina Precup

Due to the realization that deep reinforcement learning algorithms trained on high-dimensional tasks can strongly overfit to their training … (see more)environments, there have been several studies that investigated the generalization performance of these algorithms. However, there has been no similar study that evaluated the generalization performance of algorithms that were specifically designed for generalization, i.e. meta-reinforcement learning algorithms. In this paper, we assess the generalization performance of these algorithms by leveraging high-dimensional, procedurally generated environments. We find that these algorithms can display strong overfitting when they are evaluated on challenging tasks. We also observe that scalability to high-dimensional tasks with sparse rewards remains a significant problem among many of the current meta-reinforcement learning algorithms. With these results, we highlight the need for developing meta-reinforcement learning algorithms that can both generalize and scale.

2020-07-13

ICML.cc/2020/Workshop/LifelongML (unknown)

openreview.net

Chaotic Continual Learning

Touraj Laleh

Mojtaba Faramarzi

Irina Rish

Sarath Chandar

Training a deep neural network requires the model to go over training data for several epochs and update network parameters. In continual le… (see more)arning, this process results in catastrophic forgetting which is one of the core issues of this domain. Most proposed approaches for this issue try to compensate for the effects of parameter updates in the batch incremental setup in which the training model visits a lot of samples for several epochs. However, it is not realistic to expect training data will always be fed to model in a batch incremental setup. This paper proposes a chaotic stream learner that mimics the chaotic behavior of biological neurons and does not updates network parameters. In addition, it can work with fewer samples compared to deep learning models on stream learning setup. Our experiments on MNIST, CIFAR10, and Omniglot show that the chaotic stream learner has less catastrophic forgetting by its nature in comparison to a CNN model in continual learning.

2020-07-13

ICML.cc/2020/Workshop/LifelongML (unknown)

openreview.net

Historical Issue Data of Projects on Jira

A. Nicholson

Deeksha M. Arya

Jin Guo

2020-07-13

(published)

doi.org

S2RMs: Spatially Structured Recurrent Modules

Nasim Rahaman

Anirudh Goyal

Muhammad Waleed Gondal

Manuel Wüthrich

Stefan Bauer

Y. Sharma

Yoshua Bengio

Bernhard Schölkopf

2020-07-13

ArXiv (preprint)

arxiv.org

Data-Efficient Reinforcement Learning with Momentum Predictive Representations

Max Schwarzer

Ankesh Anand

Rishab Goel

(Rex) Devon Hjelm

Aaron Courville

Philip Bachman

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interacti… (see more)on with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Momentum Predictive Representations (MPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters, and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.444 on Atari in a setting limited to 100K steps of environment interaction, which is a 66% relative improvement over the previous state-of-the-art. Moreover, even in this limited data regime, MPR exceeds expert human scores on 6 out of 26 games.

2020-07-12

arXiv.org (preprint)

dblp.uni-trier.de

Material for IEEE Software paper "How Do Open Source Software Contributors Perceive and Address Usability?"

Wenting Wang

Jinghui Cheng

Jin Guo

2020-07-10

(published)

doi.org

Attenuated Anticipation of Social and Monetary Rewards in Autism Spectrum Disorders

Sarah Baumeister

Carolin Moessnang

Nico Bast

Sarah Hohmann

Julian Tillmann

David Goyard

Tony Charman

Sara Ambrosino

Simon Baron-Cohen

Christian Beckmann

Sven Bölte

Thomas Bourgeron

Annika Rausch

Daisy Crawley

Flavio Dell’Acqua

Guillaume Dumas

Sarah Durston

Christine Ecker

Dorothea L. Floris

Vincent Frouin … (see 19 more)

Hannah Hayward

Rosemary Holt

Mark Johnson

Emily J. H. Jones

Meng-Chuan Lai

Michael V. Lombardo

Luke Mason

Marianne Oldehinkel

Tony Persico

Antonia San José Cáceres

Thomas Wolfers

Will Spooren

Eva Loth

Declan Murphy

Jan K. Buitelaar

Heike Tost

Andreas Meyer-Lindenberg

Tobias Banaschewski

Daniel Brandeis

Background Reward processing has been proposed to underpin atypical social behavior, a core feature of autism spectrum disorder (ASD). Howev… (see more)er, previous neuroimaging studies have yielded inconsistent results regarding the specificity of atypicalities for social rewards in ASD. Utilizing a large sample, we aimed to assess altered reward processing in response to reward type (social, monetary) and reward phase (anticipation, delivery) in ASD. Methods Functional magnetic resonance imaging during social and monetary reward anticipation and delivery was performed in 212 individuals with ASD (7.6-30.5 years) and 181 typically developing (TD) participants (7.6-30.8 years). Results Across social and monetary reward anticipation, whole-brain analyses (p0.05, family-wise error-corrected) showed hypoactivation of the right ventral striatum (VS) in ASD. Further, region of interest (ROI) analy

2020-07-06

bioRxiv (preprint)

doi.org

Deep interpretability for GWAS

Deepak Sharma

Audrey Durand

Marc-André Legault

Louis-philippe Lemieux Perreault

Audrey Lemaccon

Marie-Pierre Dub'e

Joelle Pineau

Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In the… (see more)se studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large genetic datasets. We propose a method that uses the gradient based deep interpretability technique named DeepLIFT to show that known diabetes genetic risk factors can be identified using deep models along with possibly novel associations.

2020-07-03

ArXiv (preprint)

arxiv.org

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Publications

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Popular keywords:

Publications