HVAC-GRACE: Transferable Building Control via Heterogeneous Graph Neural Network Policies
Anaïs Berkes
Donna Vakalis
Buildings consume 40% of global energy, with HVAC systems responsible for up to half of that demand. As energy use grows, optimizing HVAC ef… (voir plus)ficiency is critical to meeting climate goals. While reinforcement learning (RL) offers a promising alternative to rule-based control, real-world adoption is limited by poor sample efficiency and generalisation. We introduce HVAC-GRACE, a graph-based RL framework that models buildings as heterogeneous graphs and integrates spatial message passing directly into temporal GRU gates. This enables each zone to learn control actions informed by both its own history and its structural context. Our architecture supports zero-shot transfer by learning topology-agnostic functions—but initial experiments reveal that this benefit depends on sufficient conditioned zone connectivity to maintain gradient flow. These findings highlight both the promise and the architectural requirements of scalable, transferable RL for building control
LLMs and Stack Overflow discussions: Reliability, impact, and challenges
Leuson Da Silva
Jordan Samhi
Model approximation in MDPs with unbounded per-step cost
Berk Bozkurt
Ashutosh Nayyar
Yi Ouyang
We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process …
Modulation of leg trajectory by transcranial magnetic stimulation during walking
H. Bourgeois
Rose Guay-Hottin
El-Mehdi Meftah
Marina Martinez
D. Barthélemy
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Daniel Lawson
Adriana Hugessen
Charlotte Cloutier
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (voir plus)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,
Speciation of coral-associated barnacles: generalists versus specialists in the Indo-West Pacific
Lorenzo C. Halasan
Yoko Nozawa
Benny Kwok Kan Chan
Using machine learning algorithms to predict students' general self-efficacy in PISA 2018
Bin Tan
Hao-Yue Jin
Zero-Shot Constraint Satisfaction with Forward- Backward Representations
Adriana Hugessen
Harley Wiltzer
Cyrus Neary
Amy Zhang
Traditionally, constrained policy optimization with Reinforcement Learning (RL) requires learning a new policy from scratch for any new envi… (voir plus)ronment, goal or cost function, with limited generalization to new tasks and constraints. Given the sample inefficiency of many common deep RL methods, this procedure can be impractical for many real-world scenarios, particularly when constraints or tasks are changing. As an alternative, in the unconstrained setting, various works have sought to pre-train representations from offline datasets to accelerate policy optimization upon specification of a reward. Such methods can permit faster adaptation to new tasks in a given environment, dramatically improving sample efficiency. Recently, zero-shot policy optimization has been explored by leveraging a particular
Circuit Discovery Helps To Detect LLM Jailbreaking
Paria Mehrbod
Boris Knyazev
geraldin nanfack
Despite extensive safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safeguards to elicit har… (voir plus)mful content. While prior work attributes this vulnerability to safety training limitations, the internal mechanisms by which LLMs process adversarial prompts remain poorly understood. We present a mechanistic analysis of the jailbreaking behavior in a large-scale, safety-aligned LLM, focusing on LLaMA-2-7B-chat-hf. Leveraging edge attribution patching and subnetwork probing, we systematically identify computational circuits responsible for generating affirmative responses to jailbreak prompts. Ablating these circuits during the first token prediction can reduce attack success rates by up to 80\%, demonstrating its critical role in safety bypass. Our analysis uncovers key attention heads and MLP pathways that mediate adversarial prompt exploitation, revealing how important tokens propagate through these components to override safety constraints. These findings advance the understanding of adversarial vulnerabilities in aligned LLMs and pave the way for targeted, interpretable defenses mechanisms based on mechanistic interpretability.
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou
Zhaocheng Zhu
Xuan Li
Mikhail Galkin
Xiao Feng
Sanmi Koyejo
Bo Han
Robust and Interpretable Relational Reasoning with Large Language Models and Symbolic Solvers
Ge Zhang
Mohammad Alomrani
Hongjian Gu
Jiaming Zhou
Yaochen Hu
Bin Wang
Qun Liu
Yingxue Zhang
Jianye Hao
Large language models (LLMs) possess vast semantic knowledge but often struggle with complex reasoning tasks, particularly in relational rea… (voir plus)soning problems such as kinship or spatial reasoning. In this paper, we present Path-of-Thoughts (PoT), a novel framework designed to tackle relation reasoning by decomposing the task into three key stages: graph extraction, path identification, and reasoning. Unlike previous approaches, PoT efficiently extracts a task-agnostic graph that identifies crucial entities, relations, and attributes within the problem context. Subsequently, PoT identifies relevant reasoning chains within the graph corresponding to the posed question, facilitating inference of potential answers. Experimental evaluations on four benchmark datasets, demanding long reasoning chains, demonstrate that PoT surpasses state-of-the-art baselines by a significant margin (maximum 21.3\%) without necessitating fine-tuning or extensive LLM calls. Furthermore, as opposed to prior neuro-symbolic methods, PoT exhibits improved resilience against LLM errors by leveraging the compositional nature of graphs.
Cervical Spinal Cord Magnetization Transfer Ratio and Its Relationship With Clinical Outcomes in Multiple Sclerosis
Lisa Eunyoung Lee
Irene M. Vavasour
Melanie Guenette
Katherine Sawicka
Neda Rashidi‐Ranjbar
Nathan Churchill
Akash Chopra
Adelia Adelia
Pierre-Louis Benveniste
Anthony Traboulsee
Nathalie Arbour
Fabrizio Giuliani
Larry D. Lynd
Scott B. Patten
Alexandre Prat
Alice Schabas
Penelope Smyth
Roger Tam
Yunyan Zhang … (voir 6 de plus)
Simon J. Graham
Mojgan Hodaie
Anthony Feinstein
Shannon Kolind
Tom A. Schweizer
Jiwon Oh