Zero-Shot Constraint Satisfaction with Forward- Backward Representations
Adriana Hugessen
Harley Wiltzer
Cyrus Neary
Amy Zhang
Traditionally, constrained policy optimization with Reinforcement Learning (RL) requires learning a new policy from scratch for any new envi… (see more)ronment, goal or cost function, with limited generalization to new tasks and constraints. Given the sample inefficiency of many common deep RL methods, this procedure can be impractical for many real-world scenarios, particularly when constraints or tasks are changing. As an alternative, in the unconstrained setting, various works have sought to pre-train representations from offline datasets to accelerate policy optimization upon specification of a reward. Such methods can permit faster adaptation to new tasks in a given environment, dramatically improving sample efficiency. Recently, zero-shot policy optimization has been explored by leveraging a particular
Circuit Discovery Helps To Detect LLM Jailbreaking
Paria Mehrbod
Boris Knyazev
geraldin nanfack
Despite extensive safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safeguards to elicit har… (see more)mful content. While prior work attributes this vulnerability to safety training limitations, the internal mechanisms by which LLMs process adversarial prompts remain poorly understood. We present a mechanistic analysis of the jailbreaking behavior in a large-scale, safety-aligned LLM, focusing on LLaMA-2-7B-chat-hf. Leveraging edge attribution patching and subnetwork probing, we systematically identify computational circuits responsible for generating affirmative responses to jailbreak prompts. Ablating these circuits during the first token prediction can reduce attack success rates by up to 80\%, demonstrating its critical role in safety bypass. Our analysis uncovers key attention heads and MLP pathways that mediate adversarial prompt exploitation, revealing how important tokens propagate through these components to override safety constraints. These findings advance the understanding of adversarial vulnerabilities in aligned LLMs and pave the way for targeted, interpretable defenses mechanisms based on mechanistic interpretability.
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou
Zhaocheng Zhu
Xuan Li
Mikhail Galkin
Xiao Feng
Sanmi Koyejo
Bo Han
Robust and Interpretable Relational Reasoning with Large Language Models and Symbolic Solvers
Ge Zhang
Mohammad Alomrani
Hongjian Gu
Jiaming Zhou
Yaochen Hu
Bin Wang
Qun Liu
Yingxue Zhang
Jianye Hao
Large language models (LLMs) possess vast semantic knowledge but often struggle with complex reasoning tasks, particularly in relational rea… (see more)soning problems such as kinship or spatial reasoning. In this paper, we present Path-of-Thoughts (PoT), a novel framework designed to tackle relation reasoning by decomposing the task into three key stages: graph extraction, path identification, and reasoning. Unlike previous approaches, PoT efficiently extracts a task-agnostic graph that identifies crucial entities, relations, and attributes within the problem context. Subsequently, PoT identifies relevant reasoning chains within the graph corresponding to the posed question, facilitating inference of potential answers. Experimental evaluations on four benchmark datasets, demanding long reasoning chains, demonstrate that PoT surpasses state-of-the-art baselines by a significant margin (maximum 21.3\%) without necessitating fine-tuning or extensive LLM calls. Furthermore, as opposed to prior neuro-symbolic methods, PoT exhibits improved resilience against LLM errors by leveraging the compositional nature of graphs.
Cervical Spinal Cord Magnetization Transfer Ratio and Its Relationship With Clinical Outcomes in Multiple Sclerosis
Lisa Eunyoung Lee
Irene M. Vavasour
Melanie Guenette
Katherine Sawicka
Neda Rashidi‐Ranjbar
Nathan Churchill
Akash Chopra
Adelia Adelia
Pierre-Louis Benveniste
Anthony Traboulsee
Nathalie Arbour
Fabrizio Giuliani
Larry D. Lynd
Scott B. Patten
Alexandre Prat
Alice Schabas
Penelope Smyth
Roger Tam
Yunyan Zhang … (see 6 more)
Simon J. Graham
Mojgan Hodaie
Anthony Feinstein
Shannon Kolind
Tom A. Schweizer
Jiwon Oh
Prompt learning with bounding box constraints for medical image segmentation.
Mélanie Gaillochet
Mehrdad Noori
Sahar Dastani
Christian Desrosiers
Pixel-wise annotations are notoriously labourious and costly to obtain in the medical domain. To mitigate this burden, weakly supervised app… (see more)roaches based on bounding box annotations-much easier to acquire-offer a practical alternative. Vision foundation models have recently shown noteworthy segmentation performance when provided with prompts such as points or bounding boxes. Prompt learning exploits these models by adapting them to downstream tasks and automating segmentation, thereby reducing user intervention. However, existing prompt learning approaches depend on fully annotated segmentation masks. This paper proposes a novel framework that combines the representational power of foundation models with the annotation efficiency of weakly supervised segmentation. More specifically, our approach automates prompt generation for foundation models using only bounding box annotations. Our proposed optimization scheme integrates multiple constraints derived from box annotations with pseudo-labels generated by the prompted foundation model. Extensive experiments across multi-modal datasets reveal that our weakly supervised method achieves an average Dice score of 84.90% in a limited data setting, outperforming existing fully-supervised and weakly-supervised approaches. The code will be available upon acceptance
Spatially and non-spatially tuned hippocampal neurons are linear perceptual and nonlinear memory encoders
Maxime Daigle
Kaicheng Yan
Benjamin Corrigan
Roberto Gulli
Julio Martinez-Trujillo
Learning to combine top-down context and feed-forward representations under ambiguity with apical and basal dendrites
Nizar Islah
Guillaume Etter
Mashbayar Tugsbayar
Busra Tugce Gurbuz
Multi-Agent Matrix Games with Individual learners: How Exploration-Exploitation Strategies Impact the Emergence of Coordination
Julien Armand
Tommy Chien-Hsuan Lin
Maxime Heuillet
Coordination between independent learning agents in a multi-agent environment is an important problem where AI systems may impact each other… (see more)s learning process. In this paper, we study how individual agents converge to optimal equilibrium in multi-agent where coordination is necessary to achieve optimality. Specifically, we cover the case of coordination to maximize every individual payoffs and coordination to maximize the collective payoff (cooperation). We study the emergence of such coordination behaviours in two-players matrix games with unknown payoff matrices and noisy bandit feedback. We consider five different environments along with widely used deterministic and stochastic bandit strategies. We study how different learning strategies and observation noise influence convergence to the optimal equilibrium. Our results indicate that coordination often emerge more easily from interactions between deterministic agents, especially when they follow the same learning behaviour. However, stochastic learning strategies appear to be more robust in the presence of many optimal joint actions. Overall, noisy observations often help stabilizing learning behaviours.
Opening the Scope of Openness in AI
Tamara Paris
A Survey of State Representation Learning for Deep Reinforcement Learning
Ayoub Echchahed
Representation learning methods are an important tool for addressing the challenges posed by complex observations spaces in sequential decis… (see more)ion making problems. Recently, many methods have used a wide variety of types of approaches for learning meaningful state representations in reinforcement learning, allowing better sample efficiency, generalization, and performance. This survey aims to provide a broad categorization of these methods within a model-free online setting, exploring how they tackle the learning of state representations differently. We categorize the methods into six main classes, detailing their mechanisms, benefits, and limitations. Through this taxonomy, our aim is to enhance the understanding of this field and provide a guide for new researchers. We also discuss techniques for assessing the quality of representations, and detail relevant future directions.
The challenge of hidden gifts in multi-agent reinforcement learning
Dane Malenfant
Cooperation between people is not always obvious. Sometimes we benefit from actions that others have taken even when we are unaware that the… (see more)y took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit to your own actions correctly when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show how credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness can benefit these settings