Portrait de Elvis  Dohmatob

Elvis Dohmatob

Membre académique associé
Professeur agrégé, Concordia University, Département d'informatique et de génie logiciel
Chercheur, Meta Facebook AI Research (FAIR)
Sujets de recherche
Équité algorithmique
Optimisation
Robustesse antagoniste
Théorie de l'apprentissage automatique

Étudiants actuels

Doctorat - Concordia
Maîtrise recherche - Concordia

Publications

The Pitfalls of Memorization: When Memorization Hinders Generalization
Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explan… (voir plus)ations. This leads to poor generalization when the learned explanations are spurious. In this work, we formalize
An Effective Theory of Bias Amplification
Arjun Subramonian
Samuel J. Bell
Levent Sagun
Machine learning models may capture and amplify biases present in data, leading to disparate test performance across social groups. To bette… (voir plus)r understand, evaluate, and mitigate these possible biases, a deeper theoretical understanding of how model design choices and data distribution properties could contribute to bias is needed. In this work, we contribute a precise analytical theory in the context of ridge regression, both with and without random projections, where the former models neural networks in a simplified regime. Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias in various feature and parameter regimes. For example, we demonstrate that there may be an optimal regularization penalty or training time to avoid bias amplification, and there can be fundamental differences in test error between groups that do not vanish with increased parameterization. Importantly, our theoretical predictions align with several empirical observations reported in the literature. We extensively empirically validate our theory on diverse synthetic and semi-synthetic datasets.
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised reg… (voir plus)ression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little as 1\% of the total training dataset) can still lead to model collapse: larger and larger training sets do not enhance performance. We further investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse. In a simplified regime where neural networks are approximated via random projections of tunable size, we both theoretically and empirically show that larger models can amplify model collapse. Interestingly, our theory also indicates that, beyond the interpolation threshold (which can be extremely high for very large datasets), larger models may mitigate the collapse, although they do not entirely prevent it. Our theoretical findings are empirically verified through experiments on language models and feed-forward neural networks for images.
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised reg… (voir plus)ression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little as 1\% of the total training dataset) can still lead to model collapse: larger and larger training sets do not enhance performance. We further investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse. In a simplified regime where neural networks are approximated via random projections of tunable size, we both theoretically and empirically show that larger models can amplify model collapse. Interestingly, our theory also indicates that, beyond the interpolation threshold (which can be extremely high for very large datasets), larger models may mitigate the collapse, although they do not entirely prevent it. Our theoretical findings are empirically verified through experiments on language models and feed-forward neural networks for images.
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised reg… (voir plus)ression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little as 1\% of the total training dataset) can still lead to model collapse: larger and larger training sets do not enhance performance. We further investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse. In a simplified regime where neural networks are approximated via random projections of tunable size, we both theoretically and empirically show that larger models can amplify model collapse. Interestingly, our theory also indicates that, beyond the interpolation threshold (which can be extremely high for very large datasets), larger models may mitigate the collapse, although they do not entirely prevent it. Our theoretical findings are empirically verified through experiments on language models and feed-forward neural networks for images.
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised reg… (voir plus)ression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little as 1\% of the total training dataset) can still lead to model collapse: larger and larger training sets do not enhance performance. We further investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse. In a simplified regime where neural networks are approximated via random projections of tunable size, we both theoretically and empirically show that larger models can amplify model collapse. Interestingly, our theory also indicates that, beyond the interpolation threshold (which can be extremely high for very large datasets), larger models may mitigate the collapse, although they do not entirely prevent it. Our theoretical findings are empirically verified through experiments on language models and feed-forward neural networks for images.
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe
Strong Model Collapse
Yunzhen Feng
Arjun Subramonian
Julia Kempe