Le Studio d'IA pour le climat de Mila vise à combler l’écart entre la technologie et l'impact afin de libérer le potentiel de l'IA pour lutter contre la crise climatique rapidement et à grande échelle.
Le programme a récemment publié sa première note politique, intitulée « Considérations politiques à l’intersection des technologies quantiques et de l’intelligence artificielle », réalisée par Padmapriya Mohan.
Hugo Larochelle nommé directeur scientifique de Mila
Professeur associé à l’Université de Montréal et ancien responsable du laboratoire de recherche en IA de Google à Montréal, Hugo Larochelle est un pionnier de l’apprentissage profond et fait partie des chercheur·euses les plus respecté·es au Canada.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent… (voir plus) work demonstrates that not only do some modular architectures generalize well, but they also lead to better out of distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparse modular connections, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.
Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modali… (voir plus)ties. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.
Fake news with detrimental societal effects has 001 attracted extensive attention and research. De-002 spite early success, the state-of-the… (voir plus)-art meth-003 ods fall short of considering the propagation 004 of news. News propagates at different times 005 through different mediums, including users, 006 comments, and sources, which form the news 007 propagation network. Moreover, the serious 008 problem of data hiding arises, which means 009 that fake news publishers disguise fake news 010 as real to confuse users by deleting comments 011 that refute the rumor or deleting the news itself 012 when it has been spread widely. Existing meth-013 ods do not consider the propagation of news 014 and fail to identify what matters in the process, 015 which leads to fake news hiding in the prop-016 agation network and escaping from detection. 017 Inspired by the propagation of news, we pro-018 pose a novel fake news detection framework 019 named TaHiD, which models the propagation 020 as a heterogeneous dynamic graph and contains 021 the propagation attention module to measure 022 the influence of different propagation. Exper-023 iments demonstrate that TaHiD extracts use-024 ful information from the news propagation net-025 work and outperforms state-of-the-art methods 026 on several benchmark datasets for fake news 027 detection. Additional studies also show that 028 TaHiD is capable of identifying fake news in 029 the case of data hiding. 030
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or s… (voir plus)trings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to temporal difference learning, to be prone to inefficient credit propagation across long action sequences. We thus propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. We prove that any global minimizer of the trajectory balance objective can define a policy that samples exactly from the target distribution. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces.
Black-box optimization formulations for biological sequence design have drawn recent attention due to their promising potential impact on th… (voir plus)e pharmaceutical industry. In this work, we propose to unify two seemingly distinct worlds: likelihood-free inference and black-box optimization, under one probabilistic framework. In tandem, we provide a recipe for constructing various sequence design methods based on this framework. We show how previous optimization approaches can be"reinvented"in our framework, and further propose new probabilistic black-box optimization algorithms. Extensive experiments on sequence design application illustrate the benefits of the proposed methodology.
The theory of representation learning aims to build methods that provably invert the data generating process with minimal domain knowledge o… (voir plus)r any source of supervision. Most prior approaches require strong distributional assumptions on the latent variables and weak supervision (auxiliary information such as timestamps) to provide provable identification guarantees. In this work, we show that if one has weak supervision from observations generated by sparse perturbations of the latent variables--e.g. images in a reinforcement learning environment where actions move individual sprites--identification is achievable under unknown continuous latent distributions. We show that if the perturbations are applied only on mutually exclusive blocks of latents, we identify the latents up to those blocks. We also show that if these perturbation blocks overlap, we identify latents up to the smallest blocks shared across perturbations. Consequently, if there are blocks that intersect in one latent variable only, then such latents are identified up to permutation and scaling. We propose a natural estimation procedure based on this theory and illustrate it on low-dimensional synthetic and image-based experiments.
Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge. There ha… (voir plus)ve been several proposals for efficient and robust visual representation learning among vision research practitioners, especially in the sensitive and critical biomedical domain. In this paper, we propose an idea for out-of-distribution generalization of chest X-ray pathologies that uses a simple balanced batch sampling technique. We observed that balanced sampling between the multiple training datasets improves the performance over baseline models trained without balancing.
Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that task … (voir plus)distribution plays a vital role in the performance of the model. Conventional wisdom is that task diversity should improve the performance of meta-learning. In this work, we find evidence to the contrary; we study different task distributions on a myriad of models and datasets to evaluate the effect of task diversity on meta-learning algorithms. For this experiment, we train on two datasets - Omniglot and miniImageNet and with three broad classes of meta-learning models - Metric-based (i.e., Protonet, Matching Networks), Optimization-based (i.e., MAML, Reptile, and MetaOptNet), and Bayesian meta-learning models (i.e., CNAPs). Our experiments demonstrate that the effect of task diversity on all these algorithms follows a similar trend, and task diversity does not seem to offer any benefits to the learning of the model. Furthermore, we also demonstrate that even a handful of tasks, repeated over multiple batches, would be sufficient to achieve a performance similar to uniform sampling and draws into question the need for additional tasks to create better models.