Le Studio d'IA pour le climat de Mila vise à combler l’écart entre la technologie et l'impact afin de libérer le potentiel de l'IA pour lutter contre la crise climatique rapidement et à grande échelle.
Hugo Larochelle nommé directeur scientifique de Mila
Professeur associé à l’Université de Montréal et ancien responsable du laboratoire de recherche en IA de Google à Montréal, Hugo Larochelle est un pionnier de l’apprentissage profond et fait partie des chercheur·euses les plus respecté·es au Canada.
Perspectives sur l’IA pour les responsables des politiques
Co-dirigé par Mila et le CIFAR, ce programme met en relation les décideur·euse·s avec des chercheur·euse·s de pointe en IA grâce à une combinaison de consultations ouvertes et d'exercices de test de faisabilité des politiques. La prochaine session aura lieu les 9 et 10 octobre.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Joint-embedding predictive architecture (JEPA) is a self-supervised learning (SSL) paradigm with the capacity of world modeling via action-c… (voir plus)onditioned prediction. Previously, JEPA world models have been shown to learn action-invariant or action-equivariant representations by predicting one view of an image from another. Unlike JEPA and similar SSL paradigms, animals, including humans, learn to recognize new objects through a sequence of active interactions. To introduce \emph{sequential} interactions, we propose \textit{seq-JEPA}, a novel SSL world model equipped with an autoregressive memory module. Seq-JEPA aggregates a sequence of action-conditioned observations to produce a global representation of them. This global representation, conditioned on the next action, is used to predict the latent representation of the next observation. We empirically show the advantages of this sequence of action-conditioned observations and examine our sequential modeling paradigm in two settings: (1) \emph{predictive learning across saccades}; a method inspired by the role of eye movements in embodied vision. This approach learns self-supervised image representations by processing a sequence of low-resolution visual patches sampled from image saliencies, without any hand-crafted data augmentations. (2) \emph{invariance-equivariance trade-off}; seq-JEPA's architecture results in automatic separation of invariant and equivariant representations, with the aggregated autoregressor outputs being mostly action-invariant and the encoder output being equivariant. This is in contrast with many equivariant SSL methods that expect a single representational space to contain both invariant and equivariant features, potentially creating a trade-off between the two. Empirically, seq-JEPA achieves competitive performance on both invariance and equivariance-related benchmarks compared to existing methods. Importantly, both invariance and equivariance-related downstream performances increase as the number of available observations increases.
Joint-embedding predictive architecture (JEPA) is a self-supervised learning (SSL) paradigm with the capacity of world modeling via action-c… (voir plus)onditioned prediction. Previously, JEPA world models have been shown to learn action-invariant or action-equivariant representations by predicting one view of an image from another. Unlike JEPA and similar SSL paradigms, animals, including humans, learn to recognize new objects through a sequence of active interactions. To introduce \emph{sequential} interactions, we propose \textit{seq-JEPA}, a novel SSL world model equipped with an autoregressive memory module. Seq-JEPA aggregates a sequence of action-conditioned observations to produce a global representation of them. This global representation, conditioned on the next action, is used to predict the latent representation of the next observation. We empirically show the advantages of this sequence of action-conditioned observations and examine our sequential modeling paradigm in two settings: (1) \emph{predictive learning across saccades}; a method inspired by the role of eye movements in embodied vision. This approach learns self-supervised image representations by processing a sequence of low-resolution visual patches sampled from image saliencies, without any hand-crafted data augmentations. (2) \emph{invariance-equivariance trade-off}; seq-JEPA's architecture results in automatic separation of invariant and equivariant representations, with the aggregated autoregressor outputs being mostly action-invariant and the encoder output being equivariant. This is in contrast with many equivariant SSL methods that expect a single representational space to contain both invariant and equivariant features, potentially creating a trade-off between the two. Empirically, seq-JEPA achieves competitive performance on both invariance and equivariance-related benchmarks compared to existing methods. Importantly, both invariance and equivariance-related downstream performances increase as the number of available observations increases.