Publications

Université de Montréal Balancing Signals for Semi-Supervised Sequence Learning
Training recurrent neural networks (RNNs) on long sequences using backpropagation through time (BPTT) remains a fundamental challenge. It ha… (voir plus)s been shown that adding a local unsupervised loss term into the optimization objective makes the training of RNNs on long sequences more effective. While the importance of an unsupervised task can in principle be controlled by a coefficient in the objective function, the gradients with respect to the unsupervised loss term still influence all the hidden state dimensions, which might cause important information about the supervised task to be degraded or erased. Compared to existing semi-supervised sequence learning methods, this thesis focuses upon a traditionally overlooked mechanism – an architecture with explicitly designed private and shared hidden units designed to mitigate the detrimental influence of the auxiliary unsupervised loss over the main supervised task. We achieve this by dividing the RNN hidden space into a private space for the supervised task or a shared space for both the supervised and unsupervised tasks. We present extensive experiments with the proposed framework on several long sequence modeling benchmark datasets. Results indicate that the proposed framework can yield performance gains in RNN models where long term dependencies are notoriously challenging to deal with.
Unsupervised Learning of Dense Visual Representations
Pedro O. Pinheiro
Amjad Almahairi
Ryan Y. Benmalek
Untangling tradeoffs between recurrence and self-attention in artificial neural networks
Giancarlo Kerg
Bhargav Kanuparthi
Anirudh Goyal
Kyle Goyette
S UPPLEMENTARY M ATERIAL - L EARNING T O N AVIGATE T HE S YNTHETICALLY A CCESSIBLE C HEMICAL S PACE U SING R EINFORCEMENT L EARNING
Sai Krishna
Gottipati
B. Sattarov
Sufeng Niu
Yashaswi Pathak
Haoran Wei
Shengchao Liu
Karam M. J. Thomas
Simon R. Blackburn
Connor Wilson. Coley
While updating the critic network, we multiply the normal random noise vector with policy noise of 0.2 and then clip it in the range -0.2 to… (voir plus) 0.2. This clipped policy noise is added to the action at the next time step a′ computed by the target actor networks f and π. The actor networks (f and π networks), target critic and target actor networks are updated once every two updates to the critic network.
Value-driven Hindsight Modelling
Arthur Guez
Theophane Weber
Lars Buesing
Steven Kapturowski
David Silver
Nicolas Heess
Value estimation is a critical component of the reinforcement learning (RL) paradigm. The question of how to effectively learn predictors fo… (voir plus)r value from data is one of the major problems studied by the RL community, and different approaches exploit structure in the problem domain in different ways. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. In contrast, model-free methods directly leverage the quantity of interest from the future but have to compose with a potentially weak scalar signal (an estimate of the return). In this paper we develop an approach for representation learning in RL that sits in between these two extremes: we propose to learn what to model in a way that can directly help value prediction. To this end we determine which features of the future trajectory provide useful information to predict the associated return. This provides us with tractable prediction targets that are directly relevant for a task, and can thus accelerate learning of the value function. The idea can be understood as reasoning, in hindsight, about which aspects of the future observations could help past value prediction. We show how this can help dramatically even in simple policy evaluation settings. We then test our approach at scale in challenging domains, including on 57 Atari 2600 games.
Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling
Tong Che
Ruixiang ZHANG
Jascha Sohl-Dickstein
Yuan Cao
We show that the sum of the implicit generator log-density …
Learning from Learning Machines: Optimisation, Rules, and Social Norms
Travis LaCroix
There is an analogy between machine learning systems and economic entities in that they are both adaptive, and their behaviour is specified … (voir plus)in a more-or-less explicit way. It appears that the area of AI that is most analogous to the behaviour of economic entities is that of morally good decision-making, but it is an open question as to how precisely moral behaviour can be achieved in an AI system. This paper explores the analogy between these two complex systems, and we suggest that a clearer understanding of this apparent analogy may help us forward in both the socio-economic domain and the AI domain: known results in economics may help inform feasible solutions in AI safety, but also known results in AI may inform economic policy. If this claim is correct, then the recent successes of deep learning for AI suggest that more implicit specifications work better than explicit ones for solving such problems.
CLOSURE: Assessing Systematic Generalization of CLEVR Models
Harm de Vries
Shikhar Murty
Philippe Beaudoin
Applying Knowledge Transfer for Water Body Segmentation in Peru
Jessenia Gonzalez
Debjani Bhowmick
César Beltrán
Kris Sankaran
Detecting GAN generated errors
Xiru Zhu
Fengdi Che
Tianzi Yang
Tzuyang Yu
Despite an impressive performance from the latest GAN for generating hyper-realistic images, GAN discriminators have difficulty evaluating t… (voir plus)he quality of an individual generated sample. This is because the task of evaluating the quality of a generated image differs from deciding if an image is real or fake. A generated image could be perfect except in a single area but still be detected as fake. Instead, we propose a novel approach for detecting where errors occur within a generated image. By collaging real images with generated images, we compute for each pixel, whether it belongs to the real distribution or generated distribution. Furthermore, we leverage attention to model long-range dependency; this allows detection of errors which are reasonable locally but not holistically. For evaluation, we show that our error detection can act as a quality metric for an individual image, unlike FID and IS. We leverage Improved Wasserstein, BigGAN, and StyleGAN to show a ranking based on our metric correlates impressively with FID scores. Our work opens the door for better understanding of GAN and the ability to select the best samples from a GAN model.
Artificial Intelligence Based Cloud Distributor (AI-CD): Probing Low Cloud Distribution with Generative Adversarial Neural Networks
T. Yuan
H. Song
David Hall
Victor Schmidt
Kris Sankaran
Automated curriculum generation for Policy Gradients from Demonstrations
Anirudh Srinivasan
Maxime Chevalier-Boisvert