Portrait of Audrey Durand

Audrey Durand

Associate Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université Laval, Department of Computer Science and Software Engineering
Research Topics
Online Learning
Reinforcement Learning

Biography

Audrey Durand is an assistant professor in the Department of Computer Science and Software Engineering and in the Department of Electrical and Computer Engineering at Université Laval.

She specializes in algorithms that learn through interaction with their environment using reinforcement learning, and is particularly interested in leveraging these approaches in health-related applications.

Current Students

Master's Research - Université Laval
Master's Research - Université de Montréal
Principal supervisor :
PhD - Université Laval
Master's Research - Université Laval
PhD - Université Laval
Master's Research - Université Laval
Master's Research - Université Laval
PhD - Université Laval

Publications

Routine Bandits: Minimizing Regret on Recurring Problems
Hassan Saber
L'eo Saci
Odalric-Ambrym Maillard
Deep interpretability for GWAS
Deepak Sharma
Louis-philippe Lemieux Perreault
Audrey Lemaccon
Marie-Pierre Dub'e
Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In the… (see more)se studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large genetic datasets. We propose a method that uses the gradient based deep interpretability technique named DeepLIFT to show that known diabetes genetic risk factors can be identified using deep models along with possibly novel associations.
Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks
By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to … (see more)produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare events, leading to potentially disastrous consequences, while justifying these same events in hindsight. To avoid this pitfall, we introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points. This leads DENN to output highly uncertain predictions for unexpected inputs. We achieve this by adding a diversity term in the loss function used to train the model, computed at specific inputs. We first illustrate the usefulness of the method on a low-dimensional regression problem. Then, we show how the loss can be adapted to tackle anomaly detection during classification, as well as safe imitation learning problems.
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
Sharan Vaswani
Abbas Mehrabian
Branislav Kveton
Leveraging exploration in off-policy algorithms via normalizing flows
Exploration is a crucial component for discovering approximately optimal policies in most high-dimensional reinforcement learning (RL) setti… (see more)ngs with sparse rewards. Approaches such as neural density models and continuous exploration (e.g., Go-Explore) have been instrumental in recent advances. Soft actor-critic (SAC) is a method for improving exploration that aims to combine off-policy updates while maximizing the policy entropy. We extend SAC to a richer class of probability distributions through normalizing flows, which we show improves performance in exploration, sample complexity, and convergence. Finally, we show that not only the normalizing flow policy outperforms SAC on MuJoCo domains, it is also significantly lighter, using as low as 5.6% of the original network's parameters for similar performance.
Literature Mining for Incorporating Inductive Bias in Biomedical Prediction Tasks (Student Abstract)
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
Sharan Vaswani
Abbas Mehrabian
Branislav Kveton
We propose …
Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning
Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensiona… (see more)l state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC).
Leveraging Observations in Bandits: Between Risks and Benefits
Andrei-Stefan Lupu
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allo… (see more)wing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application.
Online Adaptative Curriculum Learning for GANs
Thang Doan
Joao Monteiro
Isabela Albuquerque
Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open … (see more)questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage.
Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo Carcinogenesis
Charis Achilleos
Demetris C Iacovides
Katerina Strati
Georgios D. Mitsis
In this work, we present a specific case study where we aim to design effective treatment allocation strategies and validate these using a m… (see more)ouse model of skin cancer. Collecting data for modelling treatments effectiveness on animal models is an expensive and time consuming process. Moreover, acquiring this information during the full range of disease stages is hard to achieve with a conventional random treatment allocation procedure, as poor treatments cause deterioration of subject health. We therefore aim to design an adaptive allocation strategy to improve the efficiency of data collection by allocating more samples for exploring promising treatments. We cast this application as a contextual bandit problem and introduce a simple and practical algorithm for exploration-exploitation in this framework. The work builds on a recent class of approaches for non-contextual bandits that relies on subsampling to compare treatment options using an equivalent amount of information. On the technical side, we extend the subsampling strategy to the case of bandits with context, by applying subsampling within Gaussian Process regression. On the experimental side, preliminary results using 10 mice with skin tumours suggest that the proposed approach extends by more than 50% the subjects life duration compared with baseline strategies: no treatment, random treatment allocation, and constant chemotherapeutic agent. By slowing the tumour growth rate, the adaptive procedure gathers information about treatment effectiveness on a broader range of tumour volumes, which is crucial for eventually deriving sequential pharmacological treatment strategies for cancer.
Streaming kernel regression with provably adaptive mean, variance, and regularization
Odalric-Ambrym Maillard
We consider the problem of streaming kernel regression, when the observations arrive sequentially and the goal is to recover the underlying … (see more)mean function, assumed to belong to an RKHS. The variance of the noise is not assumed to be known. In this context, we tackle the problem of tuning the regularization parameter adaptively at each time step, while maintaining tight confidence bounds estimates on the value of the mean function at each point. To this end, we first generalize existing results for finite-dimensional linear regression with fixed regularization and known variance to the kernel setup with a regularization parameter allowed to be a measurable function of past observations. Then, using appropriate self-normalized inequalities we build upper and lower bound estimates for the variance, leading to Bersntein-like concentration bounds. The later is used in order to define the adaptive regularization. The bounds resulting from our technique are valid uniformly over all observation points and all time steps, and are compared against the literature with numerical experiments. Finally, the potential of these tools is illustrated by an application to kernelized bandits, where we revisit the Kernel UCB and Kernel Thompson Sampling procedures, and show the benefits of the novel adaptive kernel tuning strategy.