Yann Bouteiller

Collaborateur·rice de recherche - Polytechnique Montreal

Superviseur⋅e principal⋅e

Giovanni Beltrame

Co-supervisor

Jana Pavlasek

Sujets de recherche

Apprentissage par renforcement

Apprentissage profond

Neurosciences computationnelles

Systèmes dynamiques

Théorie de l'apprentissage automatique

Vision par ordinateur

Site web

Google Scholar

GitHub

Publications

Sociodynamics of Reinforcement Learning

Yann Bouteiller

Karthik Soma

Giovanni Beltrame

Reinforcement Learning (RL) has emerged as a core algorithmic paradigm explicitly driving innovation in a growing number of industrial appli… (voir plus)cations, including large language models and quantitative finance. Furthermore, computational neuroscience has long found evidence of natural forms of RL in biological brains. Therefore, it is crucial for the study of social dynamics to develop a scientific understanding of how RL shapes population behaviors. We leverage the framework of Evolutionary Game Theory (EGT) to provide building blocks and insights toward this objective. We propose a methodology that enables simulating large populations of RL agents in simple game theoretic interaction models. More specifically, we derive fast and parallelizable implementations of two fundamental revision protocols from multi-agent RL - Policy Gradient (PG) and Opponent-Learning Awareness (LOLA) - tailored for population simulations of random pairwise interactions in stateless normal-form games. Our methodology enables us to simulate large populations of 200,000 independent co-learning agents, yielding compelling insights into how non-stationarity-aware learners affect social dynamics. In particular, we find that LOLA learners promote cooperation in the Stag Hunt model, delay cooperative outcomes in the Hawk-Dove model, and reduce strategy diversity in the Rock-Paper-Scissors model.

2026-02-20

Transactions on Machine Learning Research (accepté)

openreview.net

From the Lab to the Theater: An Unconventional Field Robotics Journey

Ali Imran

Vivek Shankar Vardharajan

Rafael Gomes Braga

Yann Bouteiller

Abdalwhab Abdalwhab

Matthis Di-Giacomo

Alexandra Mercader

Giovanni Beltrame

David St-Onge

2024-04-10

ArXiv (prépublication)

doi.org

arxiv.org

Reinforcement Learning with Random Delays

Christopher Pal

Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the ana… (voir plus)tomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.

2021-05-02

International Conference on Learning Representations (Poster)

doi.org

openreview.net

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Yann Bouteiller

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Yann Bouteiller

Publications