Portrait de Elvis  Dohmatob

Elvis Dohmatob

Membre académique associé
Professeur agrégé, Concordia University, Département d'informatique et de génie logiciel
Chercheur, Meta Facebook AI Research (FAIR)
Sujets de recherche
Équité algorithmique
Optimisation
Robustesse antagoniste
Théorie de l'apprentissage automatique

Étudiants actuels

Doctorat - Concordia
Doctorat - Concordia
Maîtrise recherche - Concordia

Publications

Efficient Refusal Ablation in LLM through Optimal Transport
Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent a… (voir plus)ctivation-based jailbreaking methods circumvent these safety mechanisms by applying orthogonal projections to remove refusal directions, but these approaches treat refusal as a one-dimensional phenomenon and ignore the rich distributional structure of model activations. We introduce a principled framework based on optimal transport theory that transforms the entire distribution of harmful activations to match harmless ones. By combining PCA with closed-form Gaussian optimal transport, we achieve efficient computation in high-dimensional representation spaces while preserving essential geometric structure. Across six models (Llama-2, Llama-3.1, Qwen-2.5; 7B-32B parameters), our method achieves up to 11% higher attack success rates than state-of-the-art baselines while maintaining comparable perplexity, demonstrating superior preservation of model capabilities. Critically, we discover that layer-selective intervention (applying optimal transport to 1-2 carefully chosen layers at approximately 40-60% network depth) substantially outperforms full-network interventions, revealing that refusal mechanisms may be localized rather than distributed. Our analysis provides new insights into the geometric structure of safety representations and suggests that current alignment methods may be vulnerable to distributional attacks beyond simple direction removal.
Understanding Softmax Attention Layers: Exact Mean-Field Analysis on a Toy Problem
Elvis Dohmatob
Self-attention has emerged as a fundamental component driving the success of modern transformer architectures, which power large language mo… (voir plus)dels and various applications. However, a theoretical understanding of how such models actually work is still under active development. The recent work of (Marion et al., 2025) introduced the so-called "single-location regression" problem, which can provably be solved by a simplified self-attention layer but not by linear models, thereby demonstrating a striking functional separation. A rigorous analysis of self-attention with softmax for this problem is challenging due to the coupled nature of the model. In the present work, we use ideas from the classical random energy model in statistical physics to analyze softmax self-attention on the single-location problem. Our analysis yields exact analytic expressions for the population risk in terms of the overlaps between the learned model parameters and those of an oracle. Moreover, we derive a detailed description of the gradient descent dynamics for these overlaps and prove that, under broad conditions, the dynamics converge to the unique oracle attractor. Our work not only advances our understanding of self-attention but also provides key theoretical ideas that are likely to find use in further analyses of even more complex transformer architectures.
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Reyhane Askari-Hemmat
Elvis Dohmatob
Pietro Astolfi
Melissa Hall
Jakob Verbeek
Adriana Romero-Soriano
Inspired by the principle of deliberate practice in human learning, we propose Deliberate Practice for Synthetic Data Generation (DP), a nov… (voir plus)el framework that improves sample efficiency through dynamic synthetic data generation. Prior work has shown that scaling synthetic data is inherently challenging, as naively adding new data leads to diminishing returns. To address this, pruning has been identified as a key mechanism for improving scaling, enabling models to focus on the most informative synthetic samples. Rather than generating a large dataset and pruning it afterward, DP efficiently approximates the direct generation of informative samples. We theoretically show how training on challenging, informative examples improves scaling laws and empirically validate that DP achieves better scaling performance with significantly fewer training samples and iterations. On ImageNet-100, DP generates 3.4x fewer samples and requires six times fewer iterations, while on ImageNet-1k, it generates 8x fewer samples with a 30 percent reduction in iterations, all while achieving superior performance compared to prior work.
The Pitfalls of Memorization: When Memorization Hurts Generalization
Elvis Dohmatob
David Lopez-Paz
Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explan… (voir plus)ations.This behavior leads to poor generalization when the learned explanations rely on spurious correlations. In this work, we formalize the interplay between memorization and generalization, showing that spurious correlations would particularly lead to poor generalization when are combined with memorization. Memorization can reduce training loss to zero, leaving no incentive to learn robust, generalizable patterns. To address this, we propose memorization-aware training (MAT), which uses held-out predictions as a signal of memorization to shift a model's logits. MAT encourages learning robust patterns invariant across distributions, improving generalization under distribution shifts.
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification
Yunzhen Feng
Elvis Dohmatob
Pu Yang
Francois Charton
Julia Kempe
Large Language Models (LLM) are increasingly trained on data generated by other LLM, either because generated text and images become part of… (voir plus) the pre-training corpus, or because synthetized data is used as a replacement for expensive human-annotation. This raises concerns about \emph{model collapse}, a drop in model performance when their training sets include generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investigate the use of verification on synthesized data to prevent model collapse. We provide a theoretical characterization using Gaussian mixtures, linear classifiers, and linear verifiers to derive conditions with measurable proxies to assess whether the verifier can effectively select synthesized data that leads to optimal performance. We experiment with two practical tasks -- computing matrix eigenvalues with transformers and news summarization with LLMs -- which both exhibit model collapse when trained on generated data, and show that verifiers, even imperfect ones, can indeed be harnessed to prevent model collapse and that our proposed proxy measure strongly correlates with performance.
An Effective Theory of Bias Amplification
Arjun Subramonian
Samuel J. Bell
Levent Sagun
Elvis Dohmatob
Machine learning models can capture and amplify biases present in data, leading to disparate test performance across social groups. To bette… (voir plus)r understand, evaluate, and mitigate these biases, a deeper theoretical understanding of how model design choices and data distribution properties contribute to bias is needed. In this work, we contribute a precise analytical theory in the context of ridge regression, both with and without random projections, where the former models feedforward neural networks in a simplified regime. Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias in various feature and parameter regimes. For example, we observe that there may be an optimal regularization penalty or training time to avoid bias amplification, and there can be differences in test error between groups that are not alleviated with increased parameterization. Importantly, our theoretical predictions align with empirical observations reported in the literature on machine learning bias. We extensively empirically validate our theory on synthetic and semi-synthetic datasets.
Strong Model Collapse
Elvis Dohmatob
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised reg… (voir plus)ression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little as 1\% of the total training dataset) can still lead to model collapse: larger and larger training sets do not enhance performance. We further investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse. In a simplified regime where neural networks are approximated via random projections of tunable size, we both theoretically and empirically show that larger models can amplify model collapse. Interestingly, our theory also indicates that, beyond the interpolation threshold (which can be extremely high for very large datasets), larger models may mitigate the collapse, although they do not entirely prevent it. Our theoretical findings are empirically verified through experiments on language models and feed-forward neural networks for images.
Dark control: The default mode network as a reinforcement learning agent
Elvis Dohmatob
The default mode network (DMN) is believed to subserve the baseline mental activity in humans. Its higher energy consumption compared to oth… (voir plus)er brain networks and its intimate coupling with conscious awareness are both pointing to an unknown overarching function. Many research streams speak in favor of an evolutionarily adaptive role in envisioning experience to anticipate the future. In the present work, we propose a process model that tries to explain how the DMN may implement continuous evaluation and prediction of the environment to guide behavior. The main purpose of DMN activity, we argue, may be described by Markov decision processes that optimize action policies via value estimates through vicarious trial and error. Our formal perspective on DMN function naturally accommodates as special cases previous interpretations based on (a) predictive coding, (b) semantic associations, and (c) a sentinel role. Moreover, this process model for the neural optimization of complex behavior in the DMN offers parsimonious explanations for recent experimental findings in animals and humans.