Evgenii Nikishin

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

When training neural networks, dying neurons -- units becoming inactive or saturated -- are traditionally seen as harmful. This paper sheds … (voir plus)new light on this phenomenon. By exploring the impact of various hyperparameter configurations on dying neurons during training, we gather insights on how to improve upon sparse training approaches to pruning. We introduce Demon Pruning (DemP), a method that controls the proliferation of dead neurons through a combination of noise injection on active units and a one-cycle schedule regularization strategy, dynamically leading to network sparsity. Experiments on CIFAR-10 and ImageNet datasets demonstrate that DemP outperforms existing dense-to-sparse structured pruning methods, achieving better accuracy-sparsity tradeoffs and accelerating training by up to 3.56

2025-02-12

TMLR (accepté)

doi.org

openreview.net

Forgetting Transformer: Softmax Attention with a Forget Gate

Zhixuan Lin

Evgenii Nikishin

Xu He

Aaron Courville

An essential component of modern recurrent sequence models is the forget gate. While Transformers do not have an explicit recurrent form, we… (voir plus) show that a forget gate can be naturally incorporated into Transformers by down-weighting the unnormalized attention scores in a data-dependent way. We name this attention mechanism Forgetting Attention and the resulting model the Forgetting Transformer (FoX). We show that FoX outperforms the Transformer on long-context language modeling, length extrapolation, and short-context downstream tasks, while performing on par with the Transformer on long-context downstream tasks. Moreover, it is compatible with the FlashAttention algorithm and does not require any positional embeddings. Several analyses, including the needle-in-the-haystack test, show that FoX also retains the Transformer's superior long-context capabilities over recurrent sequence models such as Mamba-2, HGRN2, and DeltaNet. We also introduce a "Pro" block design that incorporates some common architectural components in recurrent sequence models and find it significantly improves the performance of both FoX and the Transformer. Our code is available at [`https://github.com/zhixuan-lin/forgetting-transformer`](https://github.com/zhixuan-lin/forgetting-transformer).

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

The Curse of Diversity in Ensemble-Based Exploration

We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established … (voir plus)exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Bellemare Marc-Emmanuel

Aaron Courville

2023-01-31

ICLR.cc/2023/Conference (notable)

openreview.net

Unleashing The Potential of Data Sharing in Ensemble Deep Reinforcement Learning

This work studies a crucial but often overlooked element of ensemble methods in deep reinforcement learning: data sharing between ensemble m… (voir plus)embers. We show that data sharing enables peer learning, a powerful learning process in which individual agents learn from each other's experience to significantly improve their performance. When given access to the experience of other ensemble members, even the worst agent can match or outperform the previously best agent, triggering a virtuous circle. However, we show that peer learning can be unstable when the agents' ability to learn is impaired due to overtraining on early data. We thus employ the recently proposed solution of periodic resets and show that it ensures effective peer learning. We perform extensive experiments on continuous control tasks from both dense states and pixels to demonstrate the strong effect of peer learning and its interaction with resets.

2022-12-08

NeurIPS.cc/2022/Workshop/DeepRL (accepté)

openreview.net

The Primacy Bias in Deep Reinforcement Learning

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore usefu… (voir plus)l evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Evgenii Nikishin

Billets de blogue

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Evgenii Nikishin

Billets de blogue

Publications