Découvrez le dernier rapport d'impact de Mila, qui met en lumière les réalisations exceptionnelles des membres de notre communauté au cours de la dernière année.
Rapport et guide politique GPAI: Vers une réelle égalité en IA
Rejoignez-nous à Mila le 26 novembre pour le lancement du rapport et du guide politique qui présente des recommandations concrètes pour construire des écosystèmes d'IA inclusifs.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Feature learning as alignment: a structural property of gradient descent in non-linear neural networks
Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the … (voir plus)most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as the neural feature ansatz (NFA). Through the NFA, the authors introduce mapping with the AGOP as a general mechanism for neural feature learning. However, these works do not provide a theoretical explanation for this correlation or its origins. In this work, we further clarify the nature of this correlation, and explain its emergence. We show that this correlation is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent features at each layer. We further establish that the alignment is driven by the interaction of weight changes induced by SGD with the pre-activation features, and analyze the resulting dynamics analytically at early times in terms of simple statistics of the inputs and labels. We prove the derivative alignment occurs with high probability in specific high dimensional settings. Finally, motivated by the observation that the NFA is driven by this centered correlation, we introduce a simple optimization rule that dramatically increases the NFA correlations at any given layer and improves the quality of features learned.
Neuronal network activity is thought to be structured around the activation of assemblies, or low-dimensional manifolds describing states of… (voir plus) activity. Both views describe neurons acting not independently, but in concert, likely facilitated by strong recurrent excitation between them. The role of inhibition in these frameworks – if considered at all – is often reduced to blanket inhibition with no specificity with respect to which excitatory neurons are targeted. We analyzed the structure of excitation and inhibition in the MICrONS 1mm3 dataset, an electron microscopic reconstruction of a piece of cortical tissue. We found that excitation was structured around a feed-forward flow in non-random motifs of seven or more neurons. This revealed a structure of information flow from a small number of sources to a larger number of potential targets that became only visible when larger motifs were considered instead of individual pairs. Inhibitory neurons targeted and were targeted by neurons in specific sequential positions of these motifs. Additionally, disynaptic inhibition was strongest between target motifs excited by the same group of source neurons, implying competition between them. The structure of this inhibition was also highly specific and symmetrical, contradicting the idea of non-specific blanket inhibition. None of these trends are detectable in only pairwise connectivity, demonstrating that inhibition is specifically structured by these large motifs. Further, we found that these motifs represent higher order connectivity patterns which are present, but to a lesser extent in a recently released, detailed computational model, and not at all in a distance-dependent control. These findings have important implications for how synaptic plasticity reorganizes neocortical connectivity to implement learning and for the specific role of inhibition in this process.
Deep learning has proven to be effective in a wide variety of loss minimization problems. However, many applications of interest, like minim… (voir plus)izing projected Bellman error and min-max optimization, cannot be modelled as minimizing a scalar loss function but instead correspond to solving a variational inequality (VI) problem. This difference in setting has caused many practical challenges as naive gradient-based approaches from supervised learning tend to diverge and cycle in the VI case. In this work, we propose a principled surrogate-based approach compatible with deep learning to solve VIs. We show that our surrogate-based approach has three main benefits: (1) under assumptions that are realistic in practice (when hidden monotone structure is present, interpolation, and sufficient optimization of the surrogates), it guarantees convergence, (2) it provides a unifying perspective of existing methods, and (3) is amenable to existing deep learning optimizers like ADAM. Experimentally, we demonstrate our surrogate-based approach is effective in min-max optimization and minimizing projected Bellman error. Furthermore, in the deep reinforcement learning case, we propose a novel variant of TD(0) which is more compute and sample efficient.
Machine unlearning aims to solve the problem of removing the influence of selected training examples from a learned model. Despite the incre… (voir plus)asing attention to this problem, it remains an open research question how to evaluate unlearning in large language models (LLMs), and what are the critical properties of the data to be unlearned that affect the quality and efficiency of unlearning. This work formalizes a metric to evaluate unlearning quality in generative models, and uses it to assess the trade-offs between unlearning quality and performance. We demonstrate that unlearning out-of-distribution examples requires more unlearning steps but overall presents a better trade-off overall. For in-distribution examples, however, we observe a rapid decay in performance as unlearning progresses. We further evaluate how example's memorization and difficulty affect unlearning under a classical gradient ascent-based approach.
"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This ar… (voir plus)ticle will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop a deep learning tool for the automatic segmentation of the spinal cord and intramedullary lesions in spinal cord injury (SCI) on T2-weighted MRI scans. Materials and Methods This retrospective study included MRI data acquired between July 2002 and February 2023 from 191 patients with SCI (mean age, 48.1 years ± 17.9 [SD]; 142 males). The data consisted of T2-weighted MRI acquired using different scanner manufacturers with various image resolutions (isotropic and anisotropic) and orientations (axial and sagittal). Patients had different lesion etiologies (traumatic, ischemic, and hemorrhagic) and lesion locations across the cervical, thoracic and lumbar spine. A deep learning model, SCIseg, was trained in a three-phase process involving active learning for the automatic segmentation of intramedullary SCI lesions and the spinal cord. The segmentations from the proposed model were visually and quantitatively compared with those from three other open-source methods (PropSeg, DeepSeg and contrast-agnostic, all part of the Spinal Cord Toolbox). Wilcoxon signed-rank test was used to compare quantitative MRI biomarkers of SCI (lesion volume, lesion length, and maximal axial damage ratio) derived from the manual reference standard lesion masks and biomarkers obtained automatically with SCIseg segmentations. Results SCIseg achieved a Dice score of 0.92 ± 0.07 (mean ± SD) and 0.61 ± 0.27 for spinal cord and SCI lesion segmentation, respectively. There was no evidence of a difference between lesion length (P = .42) and maximal axial damage ratio (P = .16) computed from manually annotated lesions and the lesion segmentations obtained using SCIseg. Conclusion SCIseg accurately segmented intramedullary lesions on a diverse dataset of T2-weighted MRI scans and extracted relevant lesion biomarkers (namely, lesion volume, lesion length, and maximal axial damage ratio). SCIseg is open-source and accessible through the Spinal Cord Toolbox (v6.2 and above). Published under a CC BY 4.0 license.