Portrait de Glen Berseth

Glen Berseth

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement
Apprentissage profond
Robotique

Biographie

Glen Berseth est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre académique principal de Mila – Institut québécois d'intelligence artificielle, détenteur d’une chaire en IA Canada-CIFAR et codirecteur du Laboratoire de robotique et d’IA intégrative de Montréal (REAL). Il a été chercheur postdoctoral à Berkeley Artificial Intelligence Research (BAIR), où il a travaillé avec Sergey Levine. Ses recherches portent sur la résolution de problèmes de prise de décision séquentielle (planification) pour les systèmes d'apprentissage autonomes du monde réel (robots). Elles ont couvert les domaines de la collaboration humain-robot, du renforcement, ainsi que de l'apprentissage continu, multiagent et hiérarchique et du méta-apprentissage. Glen Berseth a fait paraître des articles dans les meilleures publications des domaines de la robotique, de l'apprentissage automatique et de l'animation informatique. Il donne également un cours sur l'apprentissage des robots à l'Université de Montréal et à Mila, couvrant les recherches les plus récentes sur les techniques d'apprentissage automatique pour la création de robots généralistes.

Étudiants actuels

Maîtrise recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Stagiaire de recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :

Publications

Scalable Tree Search over Graphs with Learned Action Pruning for Power Grid Control
As real-world infrastructure systems become increasingly complex and large-scale, there is a growing need for learning-based control strateg… (voir plus)ies that can make informed decisions in complex and dynamic environments. However, large-scale problems — such as power grid control — introduce high-dimensional action spaces and necessitate transferability across varying grid topologies. We introduce **H**ierarchical **E**xpert-Guided **R**econfiguration **O**ptimization for **G**raph **T**opologies, **HERO-GT**, a model-based planning approach that combines a pretrained graph neural network (GNN) for topology-aware action pruning with a Monte Carlo Tree Search (MCTS) planner for targeted, structured exploration. More specifically, the high-level GNN predicts a promising subset of actions, which the low-level MCTS agent uses to focus its search and reduce computational overhead while remaining adaptable to unseen graph structures. Furthermore, the MCTS planner leverages a given *default policy*---which may be defined, for example, by heuristics, problem relaxations, or rule-based methods---to bias the search and prioritize actions that are expected to improve performance over the default. We deploy HERO-GT in power grid environments, demonstrating that it not only improves over a strong default policy, but also scales to a realistic operational setting where exhaustive search becomes computationally infeasible.
Exploration by Exploitation: Curriculum Learning for Reinforcement Learning Agents through Competence-Based Curriculum Policy Search
Nan Rosemary Ke
Sarvesh Patil
Annya Dahmani
Eunice Yiu
Alison Gopnik
Oliver Kroemer
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (voir plus)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (voir plus)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure m… (voir plus)ode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.
Efficient Morphology-Aware Policy Transfer to New Embodiments
Hongyao Tang
Mariano Phielipp
Santiago Miret
Martin Jägersand
Matthew E. Taylor
Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of p… (voir plus)olicies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with \textit{parameter efficient finetuning} (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1\% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.
Efficient Morphology-Aware Policy Transfer to New Embodiments
Hongyao Tang
Mariano Phielipp
Santiago Miret
Martin Jägersand
Matthew E. Taylor
Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of p… (voir plus)olicies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with \textit{parameter efficient finetuning} (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1\% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Mingqi Yuan
Roger Creus Castanyer
Bin Li
Xin Jin
Wenjun Zeng
Solving Bayesian inverse problems with diffusion priors and off-policy RL
This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (voir plus)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Any well-behaved generative model over a variable …
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Any well-behaved generative model over a variable …
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Any well-behaved generative model over a variable …