Portrait de Glen Berseth

Glen Berseth

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement
Apprentissage profond
Robotique

Biographie

Glen Berseth est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre académique principal de Mila – Institut québécois d'intelligence artificielle, détenteur d’une chaire en IA Canada-CIFAR et codirecteur du Laboratoire de robotique et d’IA intégrative de Montréal (REAL). Il a été chercheur postdoctoral à Berkeley Artificial Intelligence Research (BAIR), où il a travaillé avec Sergey Levine. Ses recherches portent sur la résolution de problèmes de prise de décision séquentielle (planification) pour les systèmes d'apprentissage autonomes du monde réel (robots). Elles ont couvert les domaines de la collaboration humain-robot, du renforcement, ainsi que de l'apprentissage continu, multiagent et hiérarchique et du méta-apprentissage. Glen Berseth a fait paraître des articles dans les meilleures publications des domaines de la robotique, de l'apprentissage automatique et de l'animation informatique. Il donne également un cours sur l'apprentissage des robots à l'Université de Montréal et à Mila, couvrant les recherches les plus récentes sur les techniques d'apprentissage automatique pour la création de robots généralistes.

Étudiants actuels

Maîtrise recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Stagiaire de recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :

Publications

RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
Pranav Atreya
Karl Pertsch
Tony Lee
Moo Jin Kim
Arhan Jain
Cyrus Neary
Edward S. Hu
Kanav Arora
Luca Macesanu
Matthew Leonard
Meedeum Cho
Shivin Dass
Tony Wang
Xingfang Yuan
Abhishek Gupta
Dinesh Jayaraman
Kostas Daniilidis
Roberto Martín-Martín
Youngwoon Lee
Percy Liang
Chelsea Finn
Sergey Levine
Comprehensive, unbiased, and comparable evaluation of modern generalist policies is uniquely challenging: existing approaches for robot benc… (voir plus)hmarking typically rely on heavy standardization, either by specifying fixed evaluation tasks and environments, or by hosting centralized "robot challenges", and do not readily scale to evaluating generalist policies across a broad range of tasks and environments. In this work, we propose RoboArena, a new approach for scalable evaluation of generalist robot policies in the real world. Instead of standardizing evaluations around fixed tasks, environments, or locations, we propose to crowd-source evaluations across a distributed network of evaluators. Importantly, evaluators can freely choose the tasks and environments they evaluate on, enabling easy scaling of diversity, but they are required to perform double-blind evaluations over pairs of policies. Then, by aggregating preference feedback from pairwise comparisons across diverse tasks and environments, we can derive a ranking of policies. We instantiate our approach across a network of evaluators at seven academic institutions using the DROID robot platform. Through more than 600 pairwise real-robot evaluation episodes across seven generalist policies, we demonstrate that our crowd-sourced approach can more accurately rank the performance of existing generalist policies than conventional, centralized evaluation approaches, while being more scalable, resilient, and trustworthy. We open our evaluation network to the community and hope that it can enable more accessible comparisons of generalist robot policies.
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this pap… (voir plus)er, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability induced by the data in each training batch. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.
Outsourced Diffusion Sampling: Efficient Posterior Inference in Latent Spaces of Generative Models
Any well-behaved generative model over a variable …
Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.
Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.
Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.
Task Robustness via Re-Labelling Vision-Action Robot Data
The recent trend in scaling models for robot learning has resulted in impressive policies that can perform various manipulation tasks and ge… (voir plus)neralize to novel scenarios. However, these policies continue to struggle with following instructions, likely due to the limited linguistic and action sequence diversity in existing robotics datasets. This paper introduces
RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
Pranav Atreya
Karl Pertsch
Tony Lee
Moo Jin Kim
Arhan Jain
Cyrus Neary
Edward S. Hu
Kanav Arora
Luca Macesanu
Matthew Leonard
Meedeum Cho
Shivin Dass
Tony Wang
Xingfang Yuan
Abhishek Gupta
Dinesh Jayaraman
Kostas Daniilidis
Roberto Martín-Martín
Youngwoon Lee
Percy Liang
Chelsea Finn
Sergey Levine
Adaptive Resolution Residual Networks — Generalizing Across Resolutions Easily and Efficiently
The majority of signal data captured in the real world uses numerous sensors with different resolutions. In practice, most deep learning arc… (voir plus)hitectures are fixed-resolution; they consider a single resolution at training and inference time. This is convenient to implement but fails to fully take advantage of the diverse signal data that exists. In contrast, other deep learning architectures are adaptive-resolution; they directly allow various resolutions to be processed at training and inference time. This provides computational adaptivity but either sacrifices robustness or compatibility with mainstream layers, which hinders their use. In this work, we introduce Adaptive Resolution Residual Networks (ARRNs) to surpass this tradeoff. We construct ARRNs from Laplacian residuals, which serve as generic adaptive-resolution adapters for fixed-resolution layers. We use smoothing filters within Laplacian residuals to linearly separate input signals over a series of resolution steps. We can thereby skip Laplacian residuals to cast high-resolution ARRNs into low-resolution ARRNs that are computationally cheaper yet numerically identical over low-resolution signals. We guarantee this result when Laplacian residuals are implemented with perfect smoothing kernels. We complement this novel component with Laplacian dropout, which randomly omits Laplacian residuals during training. This regularizes for robustness to a distribution of lower resolutions. This also regularizes for numerical errors that may occur when Laplacian residuals are implemented with approximate smoothing kernels. We provide a solid grounding for the advantageous properties of ARRNs through a theoretical analysis based on neural operators, and empirically show that ARRNs embrace the challenge posed by diverse resolutions with computational adaptivity, robustness, and compatibility with mainstream layers.
Curiosity-Driven Exploration via Temporal Contrastive Learning
Catherine Ji
Benjamin Eysenbach
Effective exploration in reinforcement learning requires keeping track not just of where the agent has been, but also of how the agent think… (voir plus)s about and represents the world: an agent should explore states that enable it to learn powerful representations. Temporal representations can include the information required to solve any potential task while avoiding the computational cost of reconstruction. In this paper, we propose an exploration method that uses temporal contrastive representations to drive exploration, maximizing coverage as seen through the lens of these temporal representations. We demonstrate complex exploration behaviors in locomotion, manipulation, and embodied-AI tasks, revealing previously unknown capabilities and behaviors once achievable only via extrinsic rewards.
Curiosity-Driven Exploration via Temporal \\ Contrastive Learning
Catherine Ji
Benjamin Eysenbach
Exploration remains a key challenge in reinforcement learning (RL), especially in long-horizon tasks and environments with high-dimensional … (voir plus)observations. A common strategy for effective exploration is to promote state coverage or novelty, which often involves estimating the agent's state visitation distribution. In this paper, we propose \textbf{C}uriosity-Driven Exploration via \textbf{Te}mporal \textbf{C}ontrastive Learning (\methodName), an exploration method based on temporal contrastive learning that rewards agents for reaching states with unexpected futures. This incentivizes uncovering meaningful less-visited states. \methodName is simple and does not require explicit density or uncertainty estimation, while learning representations aligned with the RL objective. It consistently outperforms standard baselines in complex mazes using different embodiments (Ant and Humanoid) and robotic manipulation tasks, while also yielding more diverse behaviors in Craftax without requiring task-specific information.
Is Exploration or Optimization the Problem for Deep Reinforcement Learning?