Ali Saheb Pasand

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (voir plus)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic baselines.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/Finding_the_Frame (publié)

openreview.net

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Ali Saheb Pasand

Pablo Samuel Castro

Pouya Bashivan

We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (voir plus)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic reward baselines.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (publié)

openreview.net

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Ali Saheb Pasand

Pablo Samuel Castro

Pouya Bashivan

We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (voir plus)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic baselines.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/Finding_the_Frame (publié)

openreview.net

RGP: Achieving Memory-Efficient Model Fine-tuning Via Randomized Gradient Projection

Ali Saheb Pasand

Pouya Bashivan

Training and fine-tuning Large Language Models (LLMs) require significant memory due to the substantial growth in the size of weight paramet… (voir plus)ers and optimizer states. While methods like low-rank adaptation (LoRA), which introduce low-rank trainable modules in parallel to frozen pre-trained weights, effectively reduce memory usage, they often fail to preserve the optimization trajectory and are generally less effective for pre-training models. On the other hand, approaches, such as GaLore, that project gradients onto lower-dimensional spaces maintain the training trajectory and perform well in pre-training but suffer from high computational complexity, as they require repeated singular value decomposition on large matrices. In this work, we propose Randomized Gradient Projection (RGP), which outperforms GaLore, the current state-of-the-art in efficient fine-tuning, on the GLUE task suite, while being 74% faster on average and requiring similar memory.

2024-12-10

Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop (publié)

proceedings.mlr.press

RGP: Achieving Memory-Efficient Model Fine-tuning Via Randomized Gradient Projection

Ali Saheb Pasand

Pouya Bashivan

2024-01-01

ENLSP (publié)

proceedings.mlr.press

Mettre à profit l'IA pour un avenir durable

Hugo Larochelle nommé directeur scientifique de Mila

Programme d’apprentissage IA sur mesure

Perspectives sur l’IA pour les responsables des politiques

Ali Saheb Pasand

Publications

Mettre à profit l'IA pour un avenir durable

Hugo Larochelle nommé directeur scientifique de Mila

Programme d’apprentissage IA sur mesure

Perspectives sur l’IA pour les responsables des politiques

Mots-clés populaires:

Ali Saheb Pasand

Publications