(Rex) Devon Hjelm

Implicit Regularization in Deep Learning: A View from Function Space

Aristide Baratin

Thomas George

César Laurent

2020-08-03

ArXiv (prépublication)

arxiv.org

Leveraging exploration in off-policy algorithms via normalizing flows

Bogdan Mazoure

Thang Doan

Audrey Durand

Joelle Pineau

(Rex) Devon Hjelm

Exploration is a crucial component for discovering approximately optimal policies in most high-dimensional reinforcement learning (RL) setti… (voir plus)ngs with sparse rewards. Approaches such as neural density models and continuous exploration (e.g., Go-Explore) have been instrumental in recent advances. Soft actor-critic (SAC) is a method for improving exploration that aims to combine off-policy updates while maximizing the policy entropy. We extend SAC to a richer class of probability distributions through normalizing flows, which we show improves performance in exploration, sample complexity, and convergence. Finally, we show that not only the normalizing flow policy outperforms SAC on MuJoCo domains, it is also significantly lighter, using as low as 5.6% of the original network's parameters for similar performance.

2020-05-12

Proceedings of the Conference on Robot Learning (publié)

proceedings.mlr.press

arxiv.org

Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning

Thang Doan

Bogdan Mazoure

Audrey Durand

Joelle Pineau

(Rex) Devon Hjelm

Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensiona… (voir plus)l state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC).

2019-09-17

ArXiv (prépublication)

openreview.net

Online Adaptative Curriculum Learning for GANs

Thang Doan

Joao Monteiro

Isabela Albuquerque

Bogdan Mazoure

Audrey Durand

Joelle Pineau

(Rex) Devon Hjelm

Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open … (voir plus)questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage.

2019-07-17

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

(Rex) Devon Hjelm

Étudiants actuels

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

(Rex) Devon Hjelm

Étudiants actuels

Publications