Michael Rizvi-Martel

The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language Models

Latent reasoning via continuous chain-of-thoughts (Latent CoT) has emerged as a promising alternative to discrete CoT reasoning. Operating i… (voir plus)n continuous space increases expressivity and has been hypothesized to enable superposition: the ability to maintain multiple candidate solutions simultaneously within a single representation. Despite theoretical arguments, it remains unclear whether language models actually leverage superposition when reasoning using latent CoTs. We investigate this question across three regimes: a training-free regime that constructs latent thoughts as convex combinations of token embeddings, a fine-tuned regime where a base model is adapted to produce latent thoughts, and a from-scratch regime where a model is trained entirely with latent thoughts to solve a given task. Using Logit Lens and entity-level probing to analyze internal representations, we find that only models trained from scratch exhibit signs of using superposition. In the training-free and fine-tuned regimes, we find that the superposition either collapses or is not used at all, with models discovering shortcut solutions instead. We argue that this is due to two complementary phenomena: i) pretraining on natural language data biases models to commit to a token in the last layers ii) capacity has a huge effect on which solutions a model favors. Together, our results offer a unified explanation for when and why superposition arises in continuous chain-of-thought reasoning, and identify the conditions under which it collapses.

2026-04-06

arXiv (prépublication)

doi.org

arxiv.org

On the Role of Depth in the Expressivity of RNNs

Maude Lizaire

Michael Rizvi-Martel

Éric Dupuis

Guillaume Rabusseau

The benefits of depth in feedforward neural networks (FNNs) are well known: composing multiple layers of linear transformations with nonline… (voir plus)ar activations enables complex computations. While similar effects are expected in recurrent neural networks (RNNs), it remains unclear how depth interacts with recurrence to shape expressive power. Here, we formally show that depth increases RNNs’ memory capacity efficiently with respect to parameters, enhancing expressivity both by enabling more complex input transformations and improving the retention of past information. We extend our analysis to 2RNNs, a generalization of RNNs with multiplicative interactions between inputs and hidden states. Unlike RNNs, which remain linear without nonlinear activations, 2RNNs perform polynomial transformations whose maximal degree grows with depth. We further show that multiplicative interactions cannot, in general, be replaced by layerwise nonlinearities. Finally, we validate these insights empirically on synthetic and real-world tasks.

2026-02-02

International Conference on Artificial Intelligence and Statistics (spotlight)

doi.org

openreview.net

A Tensor Decomposition Perspective on Second-order RNNS

Maude Lizaire

Michael Rizvi-Martel

Marawan Gamal Abdel Hameed

Guillaume Rabusseau

Second-order Recurrent Neural Networks (2RNNs) extend RNNs by leveraging second-order interactions for sequence modelling. These models are … (voir plus)provably more expressive than their first-order counterparts and have connections to well-studied models from formal language theory. However, their large parameter tensor makes computations intractable. To circumvent this issue, one approach known as MIRNN consists in limiting the type of interactions used by the model. Another is to leverage tensor decomposition to diminish the parameter count. In this work, we study the model resulting from parameterizing 2RNNs using the CP decomposition, which we call CPRNN. Intuitively, the rank of the decomposition should reduce expressivity. We analyze how rank and hidden size affect model capacity and show the relationships between RNNs, 2RNNs, MIRNNs, and CPRNNs based on these parameters. We support these results empirically with experiments on the Penn Treebank dataset which demonstrate that, with a fixed parameter budget, CPRNNs outperforms RNNs, 2RNNs, and MIRNNs with the right choice of rank and hidden size.

2024-04-30

ICML.cc/2024/Conference (spotlight)

doi.org

proceedings.mlr.press

Mila Techaide 2026

Propulsion d'entrepreneurs scientifiques

Avantage IA : productivité dans la fonction publique

Michael Rizvi-Martel

Publications

Mila Techaide 2026

Propulsion d'entrepreneurs scientifiques

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Michael Rizvi-Martel

Publications