Le Studio d'IA pour le climat de Mila vise à combler l’écart entre la technologie et l'impact afin de libérer le potentiel de l'IA pour lutter contre la crise climatique rapidement et à grande échelle.
Le programme a récemment publié sa première note politique, intitulée « Considérations politiques à l’intersection des technologies quantiques et de l’intelligence artificielle », réalisée par Padmapriya Mohan.
Hugo Larochelle nommé directeur scientifique de Mila
Professeur associé à l’Université de Montréal et ancien responsable du laboratoire de recherche en IA de Google à Montréal, Hugo Larochelle est un pionnier de l’apprentissage profond et fait partie des chercheur·euses les plus respecté·es au Canada.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
As pre-trained language models grow in size, full fine-tuning their parameters on task adaptation data becomes increasingly impractical. To … (voir plus)address this challenge, some methods for low-rank adaptation of language models have been proposed, e.g. LoRA, which incorporates trainable low-rank decomposition matrices into only some parameters of the pre-trained model, called adapters. This approach significantly reduces the number of trainable parameters compared to fine-tuning all parameters or adapters. In this work, we look at low-rank adaptation method from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA is equivalent to fine-tuning adapters with noisy batch gradients - just like what DPSGD algorithm does. We also quantify the variance of the injected noise as a decreasing function of adaptation rank. By establishing a Berry-Esseen type bound on the total variation distance between the injected noise distribution and a Gaussian noise distribution with the same variance, we show that the dynamics of low-rank adaptation is very close to when DPSGD is performed w.r.t the adapters. Following our theoretical findings and approved by our experimental results, we show that low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.
A significant approach in natural language processing involves large-scale pre-training on general domain data followed by adaptation to spe… (voir plus)cific tasks or domains. As models grow in size, full fine-tuning all parameters becomes increasingly impractical. To address this, some methods for low-rank task adaptation of language models have been proposed, e.g. LoRA and FLoRA. These methods keep the pre-trained model weights fixed and incorporate trainable low-rank decomposition matrices into some layers of the transformer architecture, called adapters. This approach significantly reduces the number of trainable parameters required for downstream tasks compared to full fine-tuning all parameters. In this work, we look at low-rank adaptation from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA and FLoRA is equivalent to injecting some random noise into the batch gradients w.r.t the adapter parameters coming from their full fine-tuning, and we quantify the variance of the injected noise. By establishing a Berry-Esseen type bound on the total variation distance between the noise distribution and a Gaussian distribution with the same variance, we show that the dynamics of LoRA and FLoRA are very close to differentially private full fine-tuning the adapters, which suggests that low-rank adaptation implicitly provides privacy w.r.t the fine-tuning data. Finally, using Johnson-Lindenstrauss lemma, we show that when augmented with gradient clipping, low-rank adaptation is almost equivalent to differentially private full fine-tuning adapters with a fixed noise scale.
As pre-trained language models grow in size, full fine-tuning their parameters on task adaptation data becomes increasingly impractical. To … (voir plus)address this challenge, some methods for low-rank adaptation of language models have been proposed, e.g. LoRA, which incorporates trainable low-rank decomposition matrices into only some parameters of the pre-trained model, called adapters. This approach significantly reduces the number of trainable parameters compared to fine-tuning all parameters or adapters. In this work, we look at low-rank adaptation method from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA is equivalent to fine-tuning adapters with noisy batch gradients - just like what DPSGD algorithm does. We also quantify the variance of the injected noise as a decreasing function of adaptation rank. By establishing a Berry-Esseen type bound on the total variation distance between the injected noise distribution and a Gaussian noise distribution with the same variance, we show that the dynamics of low-rank adaptation is very close to when DPSGD is performed w.r.t the adapters. Following our theoretical findings and approved by our experimental results, we show that low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.
Federated learning (FL), which is a decentralized machine learning (ML) approach, often incorporates differential privacy (DP) to provide ri… (voir plus)gorous data privacy guarantees. Previous works attempted to address high structured data heterogeneity in vanilla FL settings through clustering clients (a.k.a clustered FL), but these methods remain sensitive and prone to errors, further exacerbated by the DP noise. This vulnerability makes the previous methods inappropriate for differentially private FL (DPFL) settings with structured data heterogeneity. To address this gap, we propose an algorithm for differentially private clustered FL, which is robust to the DP noise in the system and identifies the underlying clients' clusters correctly. To this end, we propose to cluster clients based on both their model updates and training loss values. Furthermore, for clustering clients' model updates at the end of the first round, our proposed approach addresses the server's uncertainties by employing large batch sizes as well as Gaussian Mixture Models (GMM) to reduce the impact of DP and stochastic noise and avoid potential clustering errors. This idea is efficient especially in privacy-sensitive scenarios with more DP noise. We provide theoretical analysis to justify our approach and evaluate it across diverse data distributions and privacy budgets. Our experimental results show its effectiveness in addressing large structured data heterogeneity in DPFL.
Federated Learning (FL) is a decentralized machine learning (ML) approach that keeps data localized and often incorporates Differential Priv… (voir plus)acy (DP) to enhance privacy guarantees. Similar to previous work on DP in ML, we observed that differentially private federated learning (DPFL) introduces performance disparities, particularly affecting minority groups. Recent work has attempted to address performance fairness in vanilla FL through clustering, but this method remains sensitive and prone to errors, which are further exacerbated by the DP noise in DPFL. To fill this gap, in this paper, we propose a novel clustered DPFL algorithm designed to effectively identify clients' clusters in highly heterogeneous settings while maintaining high accuracy with DP guarantees. To this end, we propose to cluster clients based on both their model updates and training loss values. Our proposed approach also addresses the server's uncertainties in clustering clients' model updates by employing larger batch sizes along with Gaussian Mixture Model (GMM) to alleviate the impact of noise and potential clustering errors, especially in privacy-sensitive scenarios. We provide theoretical analysis of the effectiveness of our proposed approach. We also extensively evaluate our approach across diverse data distributions and privacy budgets and show its effectiveness in mitigating the disparate impact of DP in FL settings with a small computational cost.