Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Curiosity-Driven Exploration via Temporal Contrastive Learning
Effective exploration in reinforcement learning requires keeping track not just of where the agent has been, but also of how the agent think… (voir plus)s about and represents the world: an agent should explore states that enable it to learn powerful representations. Temporal representations can include the information required to solve any potential task while avoiding the computational cost of reconstruction. In this paper, we propose an exploration method that uses temporal contrastive representations to drive exploration, maximizing coverage as seen through the lens of these temporal representations. We demonstrate complex exploration behaviors in locomotion, manipulation, and embodied-AI tasks, revealing previously unknown capabilities and behaviors once achievable only via extrinsic rewards.
In the era of deep reinforcement learning, making progress is more complex, as the collected experience must be compressed into a deep model… (voir plus) for future exploitation and sampling. Many papers have shown that training a deep learning policy under the changing state and action distribution leads to sub-optimal performance even collapse. This naturally leads to the concern that even if the community creates improved exploration algorithms or reward objectives, will those improvements fall on the \textit{deaf ears} of optimization difficulties.
This work proposes a new \textit{pracitcal} sub-optimality estimator to determine optimization limitations of deep reinforcement learning algorithms. Through experiments acrossenvironments and RL algorithms, it is shown that the difference between the best data generated is
We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (voir plus)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic reward baselines.
We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (voir plus)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic baselines.
Buildings consume 40% of global energy, with HVAC systems responsible for up to half of that demand. As energy use grows, optimizing HVAC ef… (voir plus)ficiency is critical to meeting climate goals. While reinforcement learning (RL) offers a promising alternative to rule-based control, real-world adoption is limited by poor sample efficiency and generalisation.
We introduce HVAC-GRACE, a graph-based RL framework that models buildings as heterogeneous graphs and integrates spatial message passing directly into temporal GRU gates. This enables each zone to learn control actions informed by both its own history and its structural context.
Our architecture supports zero-shot transfer by learning topology-agnostic functions—but initial experiments reveal that this benefit depends on sufficient conditioned zone connectivity to maintain gradient flow. These findings highlight both the promise and the architectural requirements of scalable, transferable RL for building control
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (voir plus)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,