Publications

Acceleration in Policy Optimization

Tom Zahavy

Arthur Guez

Sebastian Flennerhag

We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) through predictive and adapt… (see more)ive directions of (functional) policy ascent. Leveraging the connection between policy iteration and policy gradient methods, we view policy optimization algorithms as iteratively solving a sequence of surrogate objectives, local lower bounds on the original objective. We define optimism as predictive modelling of the future behavior of a policy, and hindsight adaptation as taking immediate and anticipatory corrective actions to mitigate accumulating errors from overshooting predictions or delayed responses to change. We use this shared lens to jointly express other well-known algorithms, including model-based policy improvement based on forward search, and optimistic meta-learning algorithms. We show connections with Anderson acceleration, Nesterov's accelerated gradient, extra-gradient methods, and linear extrapolation in the update rule. We analyze properties of the formulation, design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.

2023-07-19

EWRL/2023/Workshop (accepted)

openreview.net

Approximate information state based convergence analysis of recurrent Q-learning

In spite of the large literature on reinforcement learning (RL) algorithms for partially observable Markov decision processes (POMDPs), a co… (see more)mplete theoretical understanding is still lacking. In a partially observable setting, the history of data available to the agent increases over time so most practical algorithms either truncate the history to a finite window or compress it using a recurrent neural network leading to an agent state that is non-Markovian. In this paper, it is shown that in spite of the lack of the Markov property, recurrent Q-learning (RQL) converges in the tabular setting. Moreover, it is shown that the quality of the converged limit depends on the quality of the representation which is quantified in terms of what is known as an approximate information state (AIS). Based on this characterization of the approximation error, a variant of RQL with AIS losses is presented. This variant performs better than a strong baseline for RQL that does not use AIS losses. It is demonstrated that there is a strong correlation between the performance of RQL over time and the loss associated with the AIS representation.

2023-07-19

EWRL/2023/Workshop (accepted)

doi.org

openreview.net

Intelligent Software Maintenance

Foutse Khomh

Mohammad Masudur Rahman

Antoine Barbez

2023-07-19

Optimising the Software Development Process with Artificial Intelligence (published)

doi.org

Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

Zhiwei Fu

Steven H. H. Ding

Furkan Alaca

Benjamin C. M. Fung

Philippe Charland

The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, co… (see more)de reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.

2023-07-19

ArXiv (preprint)

doi.org

arxiv.org

On the Convergence of Bounded Agents

David Abel

Andre Barreto

Hado Philip van Hasselt

Benjamin Van Roy

Doina Precup

Satinder Singh

When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence:… (see more) An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.

2023-07-19

ArXiv (preprint)

doi.org

arxiv.org

Efficient 1D Grouped Convolution for PyTorch a Case Study: Fast On-Device Fine-Tuning for SqueezeBERT

Seyyed Hasan Mozafari

James J. Clark

Warren J. Gross

Brett Meyer

Grouped convolution has been observed to be an effective approximation for convolution in many DNN applications. For example, SqueezeBERT, w… (see more)hich is a light and fast BERT language processing model, utilizes 1D grouped convolutions. Though SqueezeBERT is well-optimized for inference on edge devices, it suffers from poor memory management during fine-tuning (training). This results in longer fine-tuning time on resource-limited GPUs compared to the original BERT model, BERT-base, despite being specifically designed for edge devices. We study this behavior and show that this poor memory management originates from the use of 1D grouped convolutions in SqueezeBERT. We re-implement 1D grouped convolutions using fully-connected layers, addressing the poor memory allocation and data locality of 1D grouped convolutions. We show that our method is well-suited for edge devices with limited memory; further, it has a negligible effect on inference speed. When utilizing our method, we observe a 42 % reduction in fine-tuning time for SqueezeBERT on edge devices.

2023-07-18

2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (published)

doi.org

Nonlinear manifolds underlie neural population activity during behaviour

Cátia Fortunato

Jorge Bennasar-Vázquez

Junchol Park

Joanna C. Chang

Lee E. Miller

Joshua T. Dudman

Matthew G. Perich

Juan A. Gallego

There is rich variety in the activity of single neurons recorded during behaviour. Yet, these diverse single neuron responses can be well de… (see more)scribed by relatively few patterns of neural co-modulation. The study of such low-dimensional structure of neural population activity has provided important insights into how the brain generates behaviour. Virtually all of these studies have used linear dimensionality reduction techniques to estimate these population-wide co-modulation patterns, constraining them to a flat “neural manifold”. Here, we hypothesised that since neurons have nonlinear responses and make thousands of distributed and recurrent connections that likely amplify such nonlinearities, neural manifolds should be intrinsically nonlinear. Combining neural population recordings from monkey, mouse, and human motor cortex, and mouse striatum, we show that: 1) neural manifolds are intrinsically nonlinear; 2) their nonlinearity becomes more evident during complex tasks that require more varied activity patterns; and 3) manifold nonlinearity varies across architecturally distinct brain regions. Simulations using recurrent neural network models confirmed the proposed relationship between circuit connectivity and manifold nonlinearity, including the differences across architecturally distinct regions. Thus, neural manifolds underlying the generation of behaviour are inherently nonlinear, and properly accounting for such nonlinearities will be critical as neuroscientists move towards studying numerous brain regions involved in increasingly complex and naturalistic behaviours.

2023-07-18

bioRxiv (preprint)

doi.org

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

Khimya Khetarpal

Claire Vernade

Brendan O'Donoghue

Satinder Singh

Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented w… (see more)ith a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.

2023-07-17

TMLR (accepted)

doi.org

openreview.net

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Tim Cooijmans

Milad Aghajohari

Aaron Courville

2023-07-16

ArXiv (preprint)

doi.org

openreview.net

Mitigating Equipment Overloads due to Electric Vehicle Charging Using Customer Incentives

Feng Li

Ilhan Kocar

Antoine Lesage-Landry

This paper first presents a time-series impact analysis of charging electric vehicles (EVs) to loading levels of power network equipment con… (see more)sidering stochasticity in charging habits of EV owners. A novel incentive-based mitigation strategy is then designed to shift the EV charging from the peak hours when the equipment is overloaded to the off-peak hours and maintain equipment service lifetime. The incentive level and corresponding contributions from customers to alter their EV charging habits are determined by a search algorithm and a constrained optimization problem. The strategy is illustrated on a modified version of the IEEE 8500 feeder with a high EV penetration to mitigate overloads on the substation transformer.

2023-07-15

2023 IEEE Power & Energy Society General Meeting (PESGM) (published)

doi.org

Study Beekeeping potential data and development of a decision support system involving a web mapping platform

Philippe Doyon

Mickaël Germain

Guy Armel Fotso Kamga

Étienne Laliberté

Yacine Bouroubi

Madeleine Chagnon

The role of a decision support system is to gather, synthesize and present information in order to make informed decisions. In this project,… (see more) a mapping platform and a decision support system are proposed to present beekeeping data in Quebec. A complete review of the data and factors influencing honey production must first be carried out. The decision support system will be designed according to the nature of the data and access to available technologies. Continuous and real-time data management must be configured to make data interoperable. Multi-dimensional data loading tools will need to be configured to display data and analyses in a dashboard. Beekeepers will be able to optimize or move their hives according to their interpretation of the results displayed in the decision support system.

2023-07-15

IEEE International Geoscience and Remote Sensing Symposium (published)

doi.org

The default network dominates neural responses to evolving movie stories

Enning Yang

Filip Milisav

Jakub Kopal

Avram J. Holmes

Georgios D. Mitsis

Bratislav Misic

Emily S. Finn

Danilo Bzdok

Neuroscientific studies exploring real-world dynamic perception often overlook the influence of continuous changes in narrative content. In … (see more)our research, we utilize machine learning tools for natural language processing to examine the relationship between movie narratives and neural responses. By analyzing over 50,000 brain images of participants watching Forrest Gump from the studyforrest dataset, we find distinct brain states that capture unique semantic aspects of the unfolding story. The default network, associated with semantic information integration, is the most engaged during movie watching. Furthermore, we identify two mechanisms that underlie how the default network liaises with the amygdala and hippocampus. Our findings demonstrate effective approaches to understanding neural processes in everyday situations and their relation to conscious awareness.

2023-07-13

Nature Communications (published)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications