Portrait de Razvan Pascanu

Razvan Pascanu

Membre affilié
Chercheur scientifique principal, Google DeepMind
Sujets de recherche
Apprentissage à quelques exemples
Apprentissage continu
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Apprentissage profond géométrique
Apprentissage tout au long de la vie
Généralisation
Interprétabilité mécanistique
Optimisation
Réseaux de neurones
Réseaux de neurones en graphes
Réseaux de neurones profonds
Réseaux de neurones récurrents
Théorie de l'apprentissage automatique

Publications

Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization
Wojciech Masarczyk
Mateusz Ostaszewski
Tin Sum Cheng
Tomasz Trzci'nski
Aurélien Lucchi
The softmax function is a fundamental building block of deep neural networks, commonly used to define output distributions in classification… (voir plus) tasks or attention weights in transformer architectures. Despite its widespread use and proven effectiveness, its influence on learning dynamics and learned representations remains poorly understood, limiting our ability to optimize model behavior. In this paper, we study the pivotal role of the softmax function in shaping the model's representation. We introduce the concept of rank deficit bias - a phenomenon in which softmax-based deep networks find solutions of rank much lower than the number of classes. This bias depends on the softmax function's logits norm, which is implicitly influenced by hyperparameters or directly modified by softmax temperature. Furthermore, we demonstrate how to exploit the softmax dynamics to learn compressed representations or to enhance their performance on out-of-distribution data. We validate our findings across diverse architectures and real-world datasets, highlighting the broad applicability of temperature tuning in improving model performance. Our work provides new insights into the mechanisms of softmax, enabling better control over representation learning in deep neural networks.
Plasticity as the Mirror of Empowerment
David Abel
Michael Bowling
Andre Barreto
Will Dabney
Shi Dong
Steven Hansen
Anna Harutyunyan
Clare Lyle
Georgios Piliouras
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
Plasticity as the Mirror of Empowerment
David Abel
Michael Bowling
Andre Barreto
Will Dabney
Shi Dong
Steven Hansen
Anna Harutyunyan
Clare Lyle
Georgios Piliouras
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Lampinen
Arslan Chaudhry
Stephanie C.Y. Chan
Cody Wild
Diane Wan
Alex Ku
Jorg Bornschein
Murray Shanahan
James L McClelland
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Thomas Schmied
Jorg Bornschein
Jordi Grau-Moya
Markus Wulfmeier
Why do LLMs attend to the first token?
Federico Barbero
'Alvaro Arroyo
Xiangming Gu
Christos Perivolaropoulos
Michael M. Bronstein
Petar Velivckovi 'c
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Thomas Schmied
Jorg Bornschein
Jordi Grau-Moya
Markus Wulfmeier
Why do LLMs attend to the first token?
Federico Barbero
'Alvaro Arroyo
Xiangming Gu
Christos Perivolaropoulos
Michael M. Bronstein
Petar Veličković
NoProp: Training Neural Networks without Back-propagation or Forward-propagation
Qinyu Li
Yee Whye Teh
NoProp: Training Neural Networks without Back-propagation or Forward-propagation
Qinyu Li
Yee Whye Teh
How do language models learn facts? Dynamics, curricula and hallucinations
Nicolas Zucchet
Jorg Bornschein
Stephanie Chan
Andrew Lampinen
Soham De
How do language models learn facts? Dynamics, curricula and hallucinations
Nicolas Zucchet
Jorg Bornschein
Stephanie Chan
Andrew Lampinen
Soham De