Publications

Modeling the Long Term Future in Model-Based Reinforcement Learning
Nan Rosemary Ke
Amanpreet Singh
Devi Parikh
Dhruv Batra
Pix2Scene: Learning Implicit 3D Representations from Images
Probabilistic Planning with Sequential Monte Carlo Methods
Valentin Thomas
Cyril Ibrahim
Reinforced Imitation Learning from Observations
Shaping representations through communication
Olivier Tieleman
Angeliki Lazaridou
Shibl Mourad
Charles Blundell
Towards the Latent Transcriptome
In this work we propose a method to compute continuous embeddings for kmers from raw RNA-seq data, in a reference-free fashion. We report th… (see more)at our model captures information of both DNA sequence similarity as well as DNA sequence abundance in the embedding latent space. We confirm the quality of these vectors by comparing them to known gene sub-structures and report that the latent space recovers exon information from raw RNA-Seq data from acute myeloid leukemia patients. Furthermore we show that this latent space allows the detection of genomic abnormalities such as translocations as well as patient-specific mutations, making this representation space both useful for visualization as well as analysis.
Universal Successor Features for Transfer Reinforcement Learning
Dylan R. Ashley
Junfeng Wen
Transfer in Reinforcement Learning (RL) refers to the idea of applying knowledge gained from previous tasks to solving related tasks. Learni… (see more)ng a universal value function (Schaul et al., 2015), which generalizes over goals and states, has previously been shown to be useful for transfer. However, successor features are believed to be more suitable than values for transfer (Dayan, 1993; Barreto et al.,2017), even though they cannot directly generalize to new goals. In this paper, we propose (1) Universal Successor Features (USFs) to capture the underlying dynamics of the environment while allowing generalization to unseen goals and (2) a flexible end-to-end model of USFs that can be trained by interacting with the environment. We show that learning USFs is compatible with any RL algorithm that learns state values using a temporal difference method. Our experiments in a simple gridworld and with two MuJoCo environments show that USFs can greatly accelerate training when learning multiple tasks and can effectively transfer knowledge to new tasks.
Unsupervised one-to-many image translation
Samuel Lavoie-Marchildon
R Devon Hjelm
W2GAN: RECOVERING AN OPTIMAL TRANSPORT MAP WITH A GAN
Leygonie Jacob*
Jennifer She*
Amjad Almahairi
Sai Rajeswar
Where Off-Policy Deep Reinforcement Learning Fails
This work examines batch reinforcement learning–the task of maximally exploiting a given batch of off-policy data, without further data co… (see more)llection. We demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are only capable of learning with data correlated to their current policy, making them ineffective for most off-policy applications. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space to force the agent towards behaving on-policy with respect to a subset of the given data. We extend this notion to deep reinforcement learning, and to the best of our knowledge, present the first continuous control deep reinforcement learning algorithm which can learn effectively from uncorrelated off-policy data.
Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio
Stanisław Jastrzębski
Amos Storkey
Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation
Tanya Nair
Douglas L. Arnold
Deep learning (DL) networks have recently been shown to outperform other segmentation methods on various public, medical-image challenge dat… (see more)asets [3,11,16], especially for large pathologies. However, in the context of diseases such as Multiple Sclerosis (MS), monitoring all the focal lesions visible on MRI sequences, even very small ones, is essential for disease staging, prognosis, and evaluating treatment efficacy. Moreover, producing deterministic outputs hinders DL adoption into clinical routines. Uncertainty estimates for the predictions would permit subsequent revision by clinicians. We present the first exploration of multiple uncertainty estimates based on Monte Carlo (MC) dropout [4] in the context of deep networks for lesion detection and segmentation in medical images. Specifically, we develop a 3D MS lesion segmentation CNN, augmented to provide four different voxel-based uncertainty measures based on MC dropout. We train the network on a proprietary, large-scale, multi-site, multi-scanner, clinical MS dataset, and compute lesion-wise uncertainties by accumulating evidence from voxel-wise uncertainties within detected lesions. We analyze the performance of voxel-based segmentation and lesion-level detection by choosing operating points based on the uncertainty. Empirical evidence suggests that uncertainty measures consistently allow us to choose superior operating points compared only using the network's sigmoid output as a probability.