Johan Samir Obando Ceron

Adaptive Computation Pruning for the Forgetting Transformer

The recently proposed Forgetting Transformer (FoX) incorporates a forget gate into softmax attention and has shown consistently better or on… (voir plus)-par performance compared to the standard RoPE-based Transformer. Notably, many attention heads in FoX tend to forget quickly, causing their output at each timestep to rely primarily on local context. Based on this observation, we propose Adaptive Computation Pruning (ACP) for FoX, a method that dynamically prunes computations involving input-output dependencies that are strongly decayed by the forget gate. In particular, our method performs *provably safe* pruning via a dynamically set pruning threshold that guarantees the pruned attention weights are negligible. We apply ACP to language model pretraining with FoX and show it consistently reduces the number of FLOPs and memory accesses in softmax attention by around 70\% across different model sizes and context lengths, resulting in a roughly 50\% to 70\% reduction in attention runtime (or a 2--3

2025-07-07

colmweb.org/COLM/2025/Conference (accepté)

openreview.net

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Roger Creus Castanyer

Johan Samir Obando Ceron

Li Li

Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure m… (voir plus)ode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.

2025-06-18

ArXiv (prépublication)

doi.org

arxiv.org

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Roger Creus Castanyer

Johan Samir Obando Ceron

Li Li

Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure m… (voir plus)ode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.

2025-06-18

ArXiv (prépublication)

doi.org

arxiv.org

The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

Jiashun Liu

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Ling Pan

Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning. This can help i… (voir plus)mprove sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it can have the effect of"polluting"the replay buffer with data which can exacerbate optimization challenges in addition to wasting environment interactions due to wasteful sampling. We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy, which, in the context of deep RL, is the tendency towards continuing an episode until termination. To address this, we propose learn to stop (LEAST), a lightweight mechanism that enables strategic early episode termination based on Q-value and gradient statistics, which helps agents recognize when to terminate unproductive episodes early. We demonstrate that our method improves learning efficiency on a variety of RL algorithms, evaluated on both the MuJoCo and DeepMind Control Suite benchmarks.

2025-06-16

ArXiv (prépublication)

doi.org

arxiv.org

The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

Jiashun Liu

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Ling Pan

Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning. This can help i… (voir plus)mprove sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it can have the effect of"polluting"the replay buffer with data which can exacerbate optimization challenges in addition to wasting environment interactions due to wasteful sampling. We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy, which, in the context of deep RL, is the tendency towards continuing an episode until termination. To address this, we propose learn to stop (LEAST), a lightweight mechanism that enables strategic early episode termination based on Q-value and gradient statistics, which helps agents recognize when to terminate unproductive episodes early. We demonstrate that our method improves learning efficiency on a variety of RL algorithms, evaluated on both the MuJoCo and DeepMind Control Suite benchmarks.

2025-06-16

ArXiv (prépublication)

doi.org

arxiv.org

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Roger Creus Castanyer

Johan Samir Obando Ceron

Li Li

Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure m… (voir plus)ode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.

2025-06-01

arXiv (publié)

doi.org

arxiv.org

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

Jiashun Liu

Zihao Wu

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Ling Pan

2025-05-01

arXiv (publié)

doi.org

arxiv.org

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Ghada Sokar

Johan Samir Obando Ceron

Aaron Courville

Hugo Larochelle

Pablo Samuel Castro

The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While sof… (voir plus)t mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. We discover the surprising result that tokenizing the encoder output, rather than the use of multiple experts, is what is behind the efficacy of SoftMoEs. Indeed, we demonstrate that even with an appropriately scaled single expert, we are able to maintain the performance gains, largely thanks to tokenization.

2025-01-22

ICLR.cc/2025/Conference (spotlight)

doi.org

openreview.net

Neuroplastic Expansion in Deep Reinforcement Learning

Jiashun Liu

Johan Samir Obando Ceron

Aaron Courville

Ling Pan

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Brian R. Bartoldson

Siddarth Venkatraman

James Diffenderfer

Moksh J. Jain

Tal Ben-Nun

Seanie Lee

Minsu Kim

Johan Samir Obando Ceron

Yoshua Bengio

Bhavya Kailkhura

2025-01-01

arXiv.org (prépublication)

doi.org

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Brian R. Bartoldson

Siddarth Venkatraman

James Diffenderfer

Moksh J. Jain

Tal Ben-Nun

Seanie Lee

Minsu Kim

Johan Samir Obando Ceron

Yoshua Bengio

Bhavya Kailkhura

2025-01-01

arXiv.org (prépublication)

doi.org

In value-based deep reinforcement learning, a pruned network is a good network

Johan Samir Obando Ceron

Aaron Courville

Pablo Samuel Castro

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage pri… (voir plus)or insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables {value-based} agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters. Our code is publicly available, see Appendix A for details.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

openreview.net

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Johan Samir Obando Ceron

Publications

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Mots-clés populaires:

Johan Samir Obando Ceron

Publications