TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la prochaine cohorte qui débutera le 15 avril.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
In-context reinforcement learning (ICRL) promises fast adaptation to unseen environments without parameter updates, but current methods eith… (voir plus)er cannot improve beyond the training distribution or require near-optimal data, limiting practical adoption. We introduce SPICE, a Bayesian ICRL method that learns a prior over Q-values via deep ensemble and updates this prior at test-time using in-context information through Bayesian updates. To recover from poor priors resulting from training on sub-optimal data, our online inference follows an Upper-Confidence Bound rule that favours exploration and adaptation. We prove that SPICE achieves regret-optimal behaviour in both stochastic bandits and finite-horizon MDPs, even when pretrained only on suboptimal trajectories. We validate these findings empirically across bandit and control benchmarks. SPICE achieves near-optimal decisions on unseen tasks, substantially reduces regret compared to prior ICRL and meta-RL approaches while rapidly adapting to unseen tasks and remaining robust under distribution shift.
About 12% of global energy consumption is attributable to heating, ventilation, and air conditioning (HVAC) systems in buildings [11]. Machi… (voir plus)ne learning-based intelligent HVAC control offers significant energy efficiency potential, but progress is constrained by limited data for training and evaluating performance across different kinds of buildings. Existing datasets primarily target energy prediction rather than control applications, forcing studies to rely on limited building sets or single-variable perturbations that fail to capture real-world complexity. We present HOT (HVAC Operations Transfer), the first large-scale open-source dataset purpose-built for research into transfer learning in building control. HOT contains 159,744 unique building-weather combinations with systematic variations across envelope properties, occupancy patterns, and climate conditions spanning all 19 ASHRAE climate zones across 76 global locations. We formalise a comprehensive similarity-based framework with quantitative metrics for assessing transfer feasibility between source and target buildings across multiple context dimensions. Our key contributions: (1) a large-scale, open dataset and tooling enabling systematic, multi-variable transfer studies across 19 climate zones; (2) a quantitative similarity framework spanning geometry, thermal, climate, and function; and (3) zero-shot climate transfer experiments showing why realistic context variation matters for HVAC control.
2025-11-10
Proceedings of the 12th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (publié)
Urban buildings consume 40\% of global energy, yet most rely on inefficient rule-based HVAC systems due to the impracticality of deploying a… (voir plus)dvanced controllers across diverse building stock. In-context reinforcement learning (ICRL) offers promise for rapid deployment without per-building training, but standard supervised learning objectives that maximise likelihood of training actions inherit behaviour-policy bias and provide weak exploration under the distribution shifts common when transferring across buildings and climates. We present SPICE (Sampling Policies In-Context with Ensemble uncertainty), a novel ICRL method specifically designed for zero-shot building control that addresses these fundamental limitations. SPICE introduces two key methodological innovations: (i) a propensity-corrected, return-aware training objective that prioritises high-advantage, high-uncertainty actions to enable improvement beyond suboptimal training demonstrations, and (ii) lightweight value ensembles with randomised priors that provide explicit uncertainty estimates for principled episode-level Thompson sampling. At deployment, SPICE samples one value head per episode and acts greedily, resulting in temporally coherent exploration without test-time gradients or building-specific models. We establish a comprehensive experimental protocol using the HOT dataset to evaluate SPICE across diverse building types and climate zones, focusing on the energy efficiency, occupant comfort, and zero-shot transfer capabilities that are critical for urban-scale deployment.
Buildings consume 40% of global energy, with HVAC systems responsible for up to half of that demand. As energy use grows, optimizing HVAC ef… (voir plus)ficiency is critical to meeting climate goals. While reinforcement learning (RL) offers a promising alternative to rule-based control, real-world adoption is limited by poor sample efficiency and generalisation.
We introduce HVAC-GRACE, a graph-based RL framework that models buildings as heterogeneous graphs and integrates spatial message passing directly into temporal GRU gates. This enables each zone to learn control actions informed by both its own history and its structural context.
Our architecture supports zero-shot transfer by learning topology-agnostic functions—but initial experiments reveal that this benefit depends on sufficient conditioned zone connectivity to maintain gradient flow. These findings highlight both the promise and the architectural requirements of scalable, transferable RL for building control