Découvrez le dernier rapport d'impact de Mila, qui met en lumière les réalisations exceptionnelles des membres de notre communauté au cours de la dernière année.
Rapport et guide politique GPAI: Vers une réelle égalité en IA
Rejoignez-nous à Mila le 26 novembre pour le lancement du rapport et du guide politique qui présente des recommandations concrètes pour construire des écosystèmes d'IA inclusifs.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node … (voir plus)ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.
2024-07-08
Proceedings of the 41st International Conference on Machine Learning (publié)
We demonstrate that large language models (LLMs) may learn indicators of document usefulness and modulate their updates accordingly. We intr… (voir plus)oduce random strings ("tags") as indicators of usefulness in a synthetic fine-tuning dataset. Fine-tuning on this dataset leads to **implicit meta-learning (IML)**: in further fine-tuning, the model updates to make more use of text that is tagged as useful. We perform a thorough empirical investigation of this phenomenon, finding (among other things) that (i) it occurs in both pretrained LLMs and those trained from scratch, as well as on a vision task, and (ii) larger models and smaller batch sizes tend to give more IML. We also use probing to examine how IML changes the way models store knowledge in their parameters. Finally, we reflect on what our results might imply about the capabilities, risks, and controllability of future AI systems.
2024-07-08
Proceedings of the 41st International Conference on Machine Learning (publié)
We present a performant, general-purpose gradient-guided nested sampling (GGNS) algorithm, combining the state of the art in differentiable … (voir plus)programming, Hamiltonian slice sampling, clustering, mode separation, dynamic nested sampling, and parallelization. This unique combination allows GGNS to scale well with dimensionality and perform competitively on a variety of synthetic and real-world problems. We also show the potential of combining nested sampling with generative flow networks to obtain large amounts of high-quality samples from the posterior distribution. This combination leads to faster mode discovery and more accurate estimates of the partition function.
2024-07-08
Proceedings of the 41st International Conference on Machine Learning (publié)
Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time… (voir plus) intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. The model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPPs.
2024-07-08
Proceedings of the 41st International Conference on Machine Learning (publié)
There are a thousand ways to caption an image.
Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its … (voir plus)caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image.
In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image.
Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text.
We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9\% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5\% on ImageNet outperforming a similarly sized CLIP by 1.4\%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0\%.
We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to
richer visual representations.
2024-07-08
Proceedings of the 41st International Conference on Machine Learning (publié)
Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most com… (voir plus)monly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research. Code will be released upon paper acceptance.
2024-07-08
Proceedings of the 41st International Conference on Machine Learning (publié)