Audrey Durand

Plants are dynamic systems that are integral to our existence and survival. Plants face environment changes and adapt over time to their sur… (see more)rounding conditions. We argue that plant responses to an environmental stimulus are a good example of a real-world problem that can be approached within a reinforcement learning (RL)framework. With the objective of controlling a plant by moving the light source, we propose GrowSpace, as a new RL benchmark. The back-end of the simulator is implemented using the Space Colonisation Algorithm, a plant growing model based on competition for space. Compared to video game RL environments, this simulator addresses a real-world problem and serves as a test bed to visualize plant growth and movement in a faster way than physical experiments. GrowSpace is composed of a suite of challenges that tackle several problems such as control, multi-stage learning,fairness and multi-objective learning. We provide agent baselines alongside case studies to demonstrate the difficulty of the proposed benchmark.

2022-01-01

AAAI.org/2022/Workshop/AIAFS (published)

openreview.net

Pharmacists' perceptions of a machine learning model for the identification of atypical medication orders

Sophie-Camille Hogue

Flora Chen

Geneviève Brassard

Denis Lebel

Jean-François Bussières

Maxime Thibault

2021-05-06

J. Am. Medical Informatics Assoc. (published)

Routine Bandits: Minimizing Regret on Recurring Problems

Hassan Saber

L'eo Saci

Odalric-Ambrym Maillard

2021-01-01

ECML/PKDD (published)

Deep interpretability for GWAS

Deepak Sharma

Marc-andr'e Legault

Louis-philippe Lemieux Perreault

Audrey Lemaccon

Marie-Pierre Dub'e

Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In the… (see more)se studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large genetic datasets. We propose a method that uses the gradient based deep interpretability technique named DeepLIFT to show that known diabetes genetic risk factors can be identified using deep models along with possibly novel associations.

2020-07-03

ArXiv (preprint)

Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks

Maxime Wabartha

Vincent Francois-Lavet

By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to … (see more)produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare events, leading to potentially disastrous consequences, while justifying these same events in hindsight. To avoid this pitfall, we introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points. This leads DENN to output highly uncertain predictions for unexpected inputs. We achieve this by adding a diversity term in the loss function used to train the model, computed at specific inputs. We first illustrate the usefulness of the method on a low-dimensional regression problem. Then, we show how the loss can be adapted to tackle anomaly detection during classification, as well as safe imitation learning problems.

2020-07-01

International Joint Conference on Artificial Intelligence (published)

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

Sharan Vaswani

Abbas Mehrabian

Branislav Kveton

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Leveraging exploration in off-policy algorithms via normalizing flows

Bogdan Mazoure

Thang Doan

(Rex) Devon Hjelm

Exploration is a crucial component for discovering approximately optimal policies in most high-dimensional reinforcement learning (RL) setti… (see more)ngs with sparse rewards. Approaches such as neural density models and continuous exploration (e.g., Go-Explore) have been instrumental in recent advances. Soft actor-critic (SAC) is a method for improving exploration that aims to combine off-policy updates while maximizing the policy entropy. We extend SAC to a richer class of probability distributions through normalizing flows, which we show improves performance in exploration, sample complexity, and convergence. Finally, we show that not only the normalizing flow policy outperforms SAC on MuJoCo domains, it is also significantly lighter, using as low as 5.6% of the original network's parameters for similar performance.

2020-05-12

Proceedings of the Conference on Robot Learning (published)

proceedings.mlr.press

Literature Mining for Incorporating Inductive Bias in Biomedical Prediction Tasks (Student Abstract)

Qizhen Zhang

2020-04-03

AAAI Conference on Artificial Intelligence (published)

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

Sharan Vaswani

Abbas Mehrabian

Branislav Kveton

We propose …

2019-10-11

ArXiv (preprint)

Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning

Thang Doan

Bogdan Mazoure

(Rex) Devon Hjelm

Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensiona… (see more)l state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC).

2019-09-17

ArXiv (preprint)

openreview.net

Leveraging Observations in Bandits: Between Risks and Benefits

Andrei-stefan Lupu

Doina Precup

Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allo… (see more)wing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application.

2019-07-17

Proceedings of the AAAI Conference on Artificial Intelligence (published)

Online Adaptative Curriculum Learning for GANs

Thang Doan

Joao Monteiro

Isabela Albuquerque

Bogdan Mazoure

(Rex) Devon Hjelm

Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open … (see more)questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage.

2019-07-17

Proceedings of the AAAI Conference on Artificial Intelligence (published)