Publications

Safety Representations for Safer Policy Learning

Vincent Mai

Annie S Chen

Samer B. Nashed

Reinforcement learning algorithms typically necessitate extensive exploration of the state space to find optimal policies. However, in safet… (voir plus)y-critical applications, the risks associated with such exploration can lead to catastrophic consequences. Existing safe exploration methods attempt to mitigate this by imposing constraints, which often result in overly conservative behaviours and inefficient learning. Heavy penalties for early constraint violations can trap agents in local optima, deterring exploration of risky yet high-reward regions of the state space. To address this, we introduce a method that explicitly learns state-conditioned safety representations. By augmenting the state features with these safety representations, our approach naturally encourages safer exploration without being excessively cautious, resulting in more efficient and safer policy learning in safety-critical scenarios. Empirical evaluations across diverse environments show that our method significantly improves task performance while reducing constraint violations during training, underscoring its effectiveness in balancing exploration with safety.

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Sample compression unleashed : New generalization bounds for real valued losses

Mathieu Bazinet

Valentina Zantedeschi

Pascal Germain

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

proceedings.mlr.press

openreview.net

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study

Shawn Tan

Songlin Yang

Aaron Courville

Rameswar Panda

Yikang Shen

The self-attention mechanism traditionally relies on the softmax operator, necessitating positional embeddings like RoPE, or position biases… (voir plus) to account for token order. But current methods using still face length generalisation challenges. We investigate an alternative attention mechanism based on the stick-breaking process in larger scale settings. The method works as follows: For each token before the current, we determine a break point, which represents the proportion of the stick, the weight of the attention, to allocate to the current token. We repeat this on the remaining stick, until all tokens are allocated a weight, resulting in a sequence of attention weights. This process naturally incorporates recency bias, which has linguistic motivations for grammar parsing (Shen et al., 2017). We study the implications of replacing the conventional softmax-based attention mechanism with stick-breaking attention. We then discuss implementation of numerically stable stick-breaking attention and adapt Flash Attention to accommodate this mechanism. When used as a drop-in replacement for current softmax+RoPE attention systems, we find that stick-breaking attention performs competitively with current methods on length generalisation and downstream tasks. Stick-breaking also performs well at length generalisation, allowing a model trained with

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study

Shawn Tan

Songlin Yang

Aaron Courville

Rameswar Panda

Yikang Shen

The self-attention mechanism traditionally relies on the softmax operator, necessitating positional embeddings like RoPE, or position biases… (voir plus) to account for token order. But current methods using still face length generalisation challenges. We investigate an alternative attention mechanism based on the stick-breaking process in larger scale settings. The method works as follows: For each token before the current, we determine a break point, which represents the proportion of the stick, the weight of the attention, to allocate to the current token. We repeat this on the remaining stick, until all tokens are allocated a weight, resulting in a sequence of attention weights. This process naturally incorporates recency bias, which has linguistic motivations for grammar parsing (Shen et al., 2017). We study the implications of replacing the conventional softmax-based attention mechanism with stick-breaking attention. We then discuss implementation of numerically stable stick-breaking attention and adapt Flash Attention to accommodate this mechanism. When used as a drop-in replacement for current softmax+RoPE attention systems, we find that stick-breaking attention performs competitively with current methods on length generalisation and downstream tasks. Stick-breaking also performs well at length generalisation, allowing a model trained with

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Selective Unlearning via Representation Erasure Using Domain Adversarial Training

Nazanin Mohammadi Sepahvand

Eleni Triantafillou

Hugo Larochelle

Doina Precup

James J. Clark

Daniel M. Roy

Gintare Karolina Dziugaite

When deploying machine learning models in the real world, we often face the challenge of “unlearning” specific data points or subsets a… (voir plus)fter training. Inspired by Domain-Adversarial Training of Neural Networks (DANN), we propose a novel algorithm,SURE, for targeted unlearning.SURE treats the process as a domain adaptation problem, where the “forget set” (data to be removed) and a validation set from the same distribution form two distinct domains. We train a domain classifier to discriminate between representations from the forget and validation sets.Using a gradient reversal strategy similar to DANN, we perform gradient updates to the representations to “fool” the domain classifier and thus obfuscate representations belonging to the forget set. Simultaneously, gradient descent is applied to the retain set (original training data minus the forget set) to preserve its classification performance. Unlike other unlearning approaches whose training objectives are built based on model outputs, SURE directly manipulates the representations.This is key to ensure robustness against a set of more powerful attacks than currently considered in the literature, that aim to detect which examples were unlearned through access to learned embeddings. Our thorough experiments reveal that SURE has a better unlearning quality to utility trade-off compared to other standard unlearning techniques for deep neural networks.

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Selective Unlearning via Representation Erasure Using Domain Adversarial Training

Nazanin Mohammadi Sepahvand

Eleni Triantafillou

Hugo Larochelle

Doina Precup

James J. Clark

Daniel M. Roy

Gintare Karolina Dziugaite

When deploying machine learning models in the real world, we often face the challenge of “unlearning” specific data points or subsets a… (voir plus)fter training. Inspired by Domain-Adversarial Training of Neural Networks (DANN), we propose a novel algorithm,SURE, for targeted unlearning.SURE treats the process as a domain adaptation problem, where the “forget set” (data to be removed) and a validation set from the same distribution form two distinct domains. We train a domain classifier to discriminate between representations from the forget and validation sets.Using a gradient reversal strategy similar to DANN, we perform gradient updates to the representations to “fool” the domain classifier and thus obfuscate representations belonging to the forget set. Simultaneously, gradient descent is applied to the retain set (original training data minus the forget set) to preserve its classification performance. Unlike other unlearning approaches whose training objectives are built based on model outputs, SURE directly manipulates the representations.This is key to ensure robustness against a set of more powerful attacks than currently considered in the literature, that aim to detect which examples were unlearned through access to learned embeddings. Our thorough experiments reveal that SURE has a better unlearning quality to utility trade-off compared to other standard unlearning techniques for deep neural networks.

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Md Rifat Arefin

Nicolas Gontier

Ravid Shwartz-Ziv

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Solving Hidden Monotone Variational Inequalities with Surrogate Losses

Junhyung Lyle Kim

Deep learning has proven to be effective in a wide variety of loss minimization problems. However, many applications of interest, like minim… (voir plus)izing projected Bellman error and min-max optimization, cannot be modelled as minimizing a scalar loss function but instead correspond to solving a variational inequality (VI) problem. This difference in setting has caused many practical challenges as naive gradient-based approaches from supervised learning tend to diverge and cycle in the VI case. In this work, we propose a principled surrogate-based approach compatible with deep learning to solve VIs. We show that our surrogate-based approach has three main benefits: (1) under assumptions that are realistic in practice (when hidden monotone structure is present, interpolation, and sufficient optimization of the surrogates), it guarantees convergence, (2) it provides a unifying perspective of existing methods, and (3) is amenable to existing deep learning optimizers like ADAM. Experimentally, we demonstrate our surrogate-based approach is effective in min-max optimization and minimizing projected Bellman error. Furthermore, in the deep reinforcement learning case, we propose a novel variant of TD(0) which is more compute and sample efficient.

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Structure Language Models for Protein Conformation Generation

Stephen Zhewen Lu

Hongyu Guo

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning

Samuel Garcin

Trevor McInroe

Pablo Samuel Castro

Christopher G. Lucas

David Abel

Prakash Panangaden

Stefano V Albrecht

Extracting relevant information from a stream of high-dimensional observations is a central challenge for deep reinforcement learning agents… (voir plus). Actor-critic algorithms add further complexity to this challenge, as it is often unclear whether the same information will be relevant to both the actor and the critic. To this end, we here explore the principles that underlie effective representations for an actor and for a critic. We focus our study on understanding whether an actor and a critic will benefit from a decoupled, rather than shared, representation. Our primary finding is that when decoupled, the representations for the actor and critic systematically specialise in extracting different types of information from the environment---the actor's representation tends to focus on action-relevant information, while the critic's representation specialises in encoding value and dynamics information. Finally, we demonstrate how these insights help select representation learning objectives that play into the actor's and critic's respective knowledge specialisations, and improve performance in terms of agent returns.

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net