Scott Fujimoto

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

Johan Samir Obando Ceron

Samuel Lavoie

Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallel… (voir plus)ization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.

2025-10-15

ArXiv (prépublication)

Scalable Option Learning in High-Throughput Environments

Mikael Henaff

Michael Matthews

Michael Rabbat

Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, wh… (voir plus)ile promising, have yet to realize the benefits of large-scale training. In this work, we identify and solve several key challenges in scaling online hierarchical RL to high-throughput environments. We propose Scalable Option Learning (SOL), a highly scalable hierarchical RL algorithm which achieves a ~35x higher throughput compared to existing hierarchical methods. To demonstrate SOL's performance and scalability, we train hierarchical agents using 30 billion frames of experience on the complex game of NetHack, significantly surpassing flat agents and demonstrating positive scaling trends. We also validate SOL on MiniHack and Mujoco environments, showcasing its general applicability. Our code is open sourced at: github.com/facebookresearch/sol.

2025-08-30

ArXiv (prépublication)

Scalable Option Learning in High-Throughput Environments

Mikael Henaff

Michael Matthews

Michael Rabbat

Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, wh… (voir plus)ile promising, have yet to realize the benefits of large-scale training. In this work, we identify and solve several key challenges in scaling hierarchical RL to high-throughput environments. We propose Scalable Option Learning (SOL), a highly scalable hierarchical RL algorithm which achieves a 25x higher throughput compared to existing hierarchical methods. We train our hierarchical agents using 20 billion frames of experience on the complex game of NetHack, significantly surpassing flat agents and demonstrating positive scaling trends. We also validate our algorithm on MiniHack and Mujoco environments, showcasing its general applicability. Our code is open sourced at github.com/facebookresearch/sol.

2025-08-30

ArXiv (prépublication)

Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity

Samin Yeasar Arnob

Doina Precup

In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL … (voir plus)benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce"Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.

2025-06-20

ArXiv (prépublication)

Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity

Samin Yeasar Arnob

Doina Precup

In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL … (voir plus)benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce"Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.

2025-06-01

arXiv (publié)

Generalizable Imitation Learning Through Pre-Trained Representations

Wei-Di Chang

Francois Hogan

David Meger

Gregory Dudek

In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abil… (voir plus)ities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.

2025-05-19

2025 IEEE International Conference on Robotics and Automation (ICRA) (publié)

Towards General-Purpose Model-Free Reinforcement Learning

Yuandong Tian

Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored t… (voir plus)o specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

2025-01-22

ICLR.cc/2025/Conference (spotlight)

openreview.net

Fairness in Reinforcement Learning with Bisimulation Metrics

Sahand Rezaei-Shoshtari

Ensuring long-term fairness is crucial when developing automated decision making systems, specifically in dynamic and sequential environment… (voir plus)s. By maximizing their reward without consideration of fairness, AI agents can introduce disparities in their treatment of groups or individuals. In this paper, we establish the connection between bisimulation metrics and group fairness in reinforcement learning. We propose a novel approach that leverages bisimulation metrics to learn reward functions and observation dynamics, ensuring that learners treat groups fairly while reflecting the original problem. We demonstrate the effectiveness of our method in addressing disparities in sequential decision making problems through empirical evaluation on a standard fairness benchmark consisting of lending and college admission scenarios.

2024-12-22

ArXiv (prépublication)

Fairness in Reinforcement Learning with Bisimulation Metrics

Sahand Rezaei-Shoshtari

Ensuring long-term fairness is crucial when developing automated decision making systems, specifically in dynamic and sequential environment… (voir plus)s. By maximizing their reward without consideration of fairness, AI agents can introduce disparities in their treatment of groups or individuals. In this paper, we establish the connection between bisimulation metrics and group fairness in reinforcement learning. We propose a novel approach that leverages bisimulation metrics to learn reward functions and observation dynamics, ensuring that learners treat groups fairly while reflecting the original problem. We demonstrate the effectiveness of our method in addressing disparities in sequential decision making problems through empirical evaluation on a standard fairness benchmark consisting of lending and college admission scenarios.

2024-12-22

ArXiv (prépublication)

Imitation Learning from Observation through Optimal Transport

Wei-Di Chang

David Meger

Gregory Dudek

2024-05-14

rl-conference.cc/RLC/2024/Conference (publié)

openreview.net

For SALE: State-Action Representation Learning for Deep Reinforcement Learning