Glen Berseth

Biographie

Glen Berseth est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre académique principal de Mila – Institut québécois d'intelligence artificielle, détenteur d’une chaire en IA Canada-CIFAR et codirecteur du Laboratoire de robotique et d’IA intégrative de Montréal (REAL). Il a été chercheur postdoctoral à Berkeley Artificial Intelligence Research (BAIR), où il a travaillé avec Sergey Levine. Ses recherches portent sur la résolution de problèmes de prise de décision séquentielle (planification) pour les systèmes d'apprentissage autonomes du monde réel (robots). Elles ont couvert les domaines de la collaboration humain-robot, du renforcement, ainsi que de l'apprentissage continu, multiagent et hiérarchique et du méta-apprentissage. Glen Berseth a fait paraître des articles dans les meilleures publications des domaines de la robotique, de l'apprentissage automatique et de l'animation informatique. Il donne également un cours sur l'apprentissage des robots à l'Université de Montréal et à Mila, couvrant les recherches les plus récentes sur les techniques d'apprentissage automatique pour la création de robots généralistes.

Étudiants actuels

Doctorat - UdeM

Doctorat - UdeM

Maîtrise recherche - UdeM

Site web

Roger Creus-Castanyer

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - McGill

Superviseur⋅e principal⋅e :

Hsiu-Chin Lin

Léa Demeule

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Liam Paull

Kumaraditya Gupta

Doctorat

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Adriana Knatchbull-Hugessen

Doctorat - UdeM

Artur Kuramshin

Maîtrise recherche - UdeM

Site web

Daniel Lawson

Doctorat - UdeM

Co-superviseur⋅e :

Postdoctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Postdoctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Marc Gendron-Bellemare

Collaborateur·rice de recherche

L'apprentissage par renforcement en temps réel

Siddarth Venkatraman

Doctorat - UdeM

Co-superviseur⋅e :

Albert Zhan

Doctorat - UdeM

Billets de blogue

Deux robots dans une cuisine, en train de préparer le dîner. L'un coupe les légumes et l'autre fait une omelette.

20 juin 2025

par

Ivan Anokhin

Matthew Riemer

Rishav Rishav

Gopeshh Subbaraj

Glen Berseth

Lire l'article

Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation

15 février 2023

Apprentissage par renforcement entièrement autonome dans le monde réel avec des applications à la manipulation mobile

par

Jędrzej Orbik

Charles Sun

Coline Devin

Glen Berseth

Lire l'article

Publications

Scalable Tree Search over Graphs with Learned Action Pruning for Power Grid Control

Florence Cloutier

Cyrus Neary

Adriana Hugessen

Viktor Todosijević

Zina Kamel

As real-world infrastructure systems become increasingly complex and large-scale, there is a growing need for learning-based control strateg… (voir plus)ies that can make informed decisions in complex and dynamic environments. However, large-scale problems — such as power grid control — introduce high-dimensional action spaces and necessitate transferability across varying grid topologies. We introduce **H**ierarchical **E**xpert-Guided **R**econfiguration **O**ptimization for **G**raph **T**opologies, **HERO-GT**, a model-based planning approach that combines a pretrained graph neural network (GNN) for topology-aware action pruning with a Monte Carlo Tree Search (MCTS) planner for targeted, structured exploration. More specifically, the high-level GNN predicts a promising subset of actions, which the low-level MCTS agent uses to focus its search and reduce computational overhead while remaining adaptable to unseen graph structures. Furthermore, the MCTS planner leverages a given *default policy*---which may be defined, for example, by heuristics, problem relaxations, or rule-based methods---to bias the search and prioritize actions that are expected to improve performance over the default. We deploy HERO-GT in power grid environments, demonstrating that it not only improves over a strong default policy, but also scales to a realistic operational setting where exhaustive search becomes computationally infeasible.

2025-06-17

rl-conference.cc/RLC/2025/Workshop/RL4RS (publié)

Exploration by Exploitation: Curriculum Learning for Reinforcement Learning Agents through Competence-Based Curriculum Policy Search

Tabitha Edith Lee

Nan Rosemary Ke

Sarvesh Patil

Annya Dahmani

Eunice Yiu

Esra'a Saleh

Alison Gopnik

Oliver Kroemer

2025-06-12

ICML.cc/2025/Workshop/EXAIT (poster)

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Daniel Lawson

Adriana Hugessen

Charlotte Cloutier

Khimya Khetarpal

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (voir plus)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,

2025-06-11

ArXiv (prépublication)

arxiv.org

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Daniel Lawson

Adriana Hugessen

Charlotte Cloutier

Khimya Khetarpal

2025-06-11

ArXiv (prépublication)

arxiv.org

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Roger Creus Castanyer

Johan Samir Obando Ceron

Li Li

Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure m… (voir plus)ode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.

2025-06-01

arXiv (publié)

arxiv.org

Efficient Morphology-Aware Policy Transfer to New Embodiments

Michael Przystupa

Hongyao Tang

Mariano Phielipp

Santiago Miret

Martin Jägersand

Matthew E. Taylor

Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of p… (voir plus)olicies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with \textit{parameter efficient finetuning} (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1\% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.

2025-05-09

rl-conference.cc/RLC/2025/Conference (accepté)

Efficient Morphology-Aware Policy Transfer to New Embodiments

Michael Przystupa

Hongyao Tang

Mariano Phielipp

Santiago Miret

Martin Jägersand

Matthew E. Taylor

2025-05-09

rl-conference.cc/RLC/2025/Conference (publié)

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

Mingqi Yuan

Roger Creus Castanyer

Bin Li

Xin Jin

Wenjun Zeng

2025-04-24

TMLR (accepté)

Laurence Perreault-Levasseur

Solving Bayesian inverse problems with diffusion priors and off-policy RL

Luca Scimeca

Siddarth Venkatraman

Moksh J. Jain

Minsu Kim

Yoshua Bengio

Nikolay Malkin

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (voir plus)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

2025-03-06

ICLR.cc/2025/Workshop/DeLTa (poster)