Renewal Monte Carlo: Renewal Theory-Based Reinforcement Learning
Jayakumar Subramanian
An online reinforcement learning algorithm called renewal Monte Carlo (RMC) is presented. RMC works for infinite horizon Markov decision pro… (see more)cesses with a designated start state. RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo—viz., simplicity, ease of implementation, and low bias—while circumventing the main drawbacks of Monte Carlo—viz., high variance and delayed updates. Given a parameterized policy
Inferring disease subtypes from clusters in explanation space
Marc-Andre Schulz
Matt Chapman-Rounds
Manisha Verma
Konstantinos Georgatzis
Deriving Differential Target Propagation from Iterating Approximate Inverses
Predicting COVID-19 Pneumonia Severity on Chest X-ray With Deep Learning
Joseph Paul Cohen
Lan Dao
Paul Morrison
Karsten Roth
Beiyi Shen
Almas F Abbasi
Hoshmand Kochi Mahsa
Marzyeh Ghassemi
Haifang Li
Tim Q Duong
Introduction The need to streamline patient management for coronavirus disease-19 (COVID-19) has become more pressing than ever. Chest X-ray… (see more)s (CXRs) provide a non-invasive (potentially bedside) tool to monitor the progression of the disease. In this study, we present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images. Such a tool can gauge the severity of COVID-19 lung infections (and pneumonia in general) that can be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU. Methods Images from a public COVID-19 database were scored retrospectively by three blinded experts in terms of the extent of lung involvement as well as the degree of opacity. A neural network model that was pre-trained on large (non-COVID-19) chest X-ray datasets is used to construct features for COVID-19 images which are predictive for our task. Results This study finds that training a regression model on a subset of the outputs from this pre-trained chest X-ray model predicts our geographic extent score (range 0-8) with 1.14 mean absolute error (MAE) and our lung opacity score (range 0-6) with 0.78 MAE. Conclusions These results indicate that our model’s ability to gauge the severity of COVID-19 lung infections could be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU. To enable follow up work, we make our code, labels, and data available online.
BabyAI 1.1
David Y. T. Hui
Maxime Chevalier-Boisvert
The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 p… (see more)resents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent's architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90.4 %. We hope that these improvements increase the computational efficiency of BabyAI experiments and help users design better agents.
BabyAI 1.1
David Y. T. Hui
Maxime Chevalier-Boisvert
Neuronal activity remodels the F-actin based submembrane lattice in dendrites but not axons of hippocampal neurons
Flavie Lavoie-Cardinal
Anthony Bilodeau
Mado Lemieux
Marc-André Gardner
Theresa Wiesner
Gabrielle Laramée
Paul De Koninck
Survey on Applications of Multi-Armed and Contextual Bandits
Djallel Bouneffouf
Charu Aggarwal
In recent years, the multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems a… (see more)nd information retrieval to healthcare and finance. This success is due to its stellar performance combined with attractive properties, such as learning from less feedback. The multiarmed bandit field is currently experiencing a renaissance, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize the state-of-the-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this burgeoning field.
Molecular signatures of cognition and affect
Justine Y. Hansen
Ross D Markello
Jacob W. Vogel
Jakob Seidlitz
Bratislav Mišić
Extendable and invertible manifold learning with geometry regularized autoencoders
Andres F. Duque Correa
Sacha Morin
Kevin R. Moon
A fundamental task in data exploration is to extract simplified low dimensional representations that capture intrinsic geometry in data, esp… (see more)ecially for faithfully visualizing data in two or three dimensions. Common approaches to this task use kernel methods for manifold learning. However, these methods typically only provide an embedding of fixed input data and cannot extend to new data points. Autoencoders have also recently become popular for representation learning. But while they naturally compute feature extractors that are both extendable to new data and invertible (i.e., reconstructing original features from latent representation), they have limited capabilities to follow global intrinsic geometry compared to kernel-based manifold learning. We present a new method for integrating both approaches by incorporating a geometric regularization term in the bottleneck of the autoencoder. Our regularization, based on the diffusion potential distances from the recently-proposed PHATE visualization method, encourages the learned latent representation to follow intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension to new data and reconstruction of data in the original feature space from latent coordinates. We compare our approach with leading kernel methods and autoencoder models for manifold learning to provide qualitative and quantitative evidence of our advantages in preserving intrinsic structure, out of sample extension, and reconstruction. Our method is easily implemented for big-data applications, whereas other methods are limited in this regard.
Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP
Amy Zhang
Shagun Sodhani
Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (see more)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.
A Brief Look at Generalization in Visual Meta-Reinforcement Learning
Safa Alver
Due to the realization that deep reinforcement learning algorithms trained on high-dimensional tasks can strongly overfit to their training … (see more)environments, there have been several studies that investigated the generalization performance of these algorithms. However, there has been no similar study that evaluated the generalization performance of algorithms that were specifically designed for generalization, i.e. meta-reinforcement learning algorithms. In this paper, we assess the generalization performance of these algorithms by leveraging high-dimensional, procedurally generated environments. We find that these algorithms can display strong overfitting when they are evaluated on challenging tasks. We also observe that scalability to high-dimensional tasks with sparse rewards remains a significant problem among many of the current meta-reinforcement learning algorithms. With these results, we highlight the need for developing meta-reinforcement learning algorithms that can both generalize and scale.