Développez des compétences fondamentales en intelligence artificielle (IA) responsable grâce à des cours autodirigés, animés par des expert·e·s de Mila reconnu·e·s à l’échelle internationale.
Le Fellowship Mila en politiques de l'IA transforme l'expertise approfondie en IA en politiques rigoureuses d'intérêt public. Découvrez la dernière publication Combler la disparité en matière d’expertise : mécanismes de transfert des connaissances pour la réglementation de l’IA par Moritz von Knebel.
Ce programme soutient les startups spécialisées en IA à tout moment de l'année. Bénéficiez de ressources de pointe et d'un accompagnement sur mesure pour accélérer le développement de votre technologie.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiab… (voir plus)le games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics are more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.
2019-04-10
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (publié)
Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse. In this paper, we consider a c… (voir plus)hallenging setting where an agent and an expert use different actions from each other. We assume that the agent has access to a sparse reward function and state-only expert observations. We propose a method which gradually balances between the imitation learning cost and the reinforcement learning objective. In addition, this method adapts the agent's policy based on either mimicking expert behavior or maximizing sparse reward. We show, through navigation scenarios, that (i) an agent is able to efficiently leverage sparse rewards to outperform standard state-only imitation learning, (ii) it can learn a policy even when its actions are different from the expert, and (iii) the performance of the agent is not bounded by that of the expert, due to the optimized usage of sparse rewards.
In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and i… (voir plus)nformation retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field.
We present a novel recurrent neural network (RNN)–based model that combines the remembering ability of unitary evolution RNNs with the abi… (voir plus)lity of gated RNNs to effectively forget redundant or irrelevant information in its memory. We achieve this by extending restricted orthogonal evolution RNNs with a gating mechanism similar to gated recurrent unit RNNs with a reset gate and an update gate. Our model is able to outperform long short-term memory, gated recurrent units, and vanilla unitary or orthogonal RNNs on several long-term-dependency benchmark tasks. We empirically show that both orthogonal and unitary RNNs lack the ability to forget. This ability plays an important role in RNNs. We provide competitive results along with an analysis of our model on many natural sequential tasks, including question answering, speech spectrum prediction, character-level language modeling, and synthetic tasks that involve long-term dependencies such as algorithmic, denoising, and copying tasks.
This paper provides a taxonomy for the licensing of data in the fields of artificial intelligence and machine learning. The paper's goal is … (voir plus)to build towards a common framework for data licensing akin to the licensing of open source software. Increased transparency and resolving conceptual ambiguities in existing licensing language are two noted benefits of the approach proposed in the paper. In parallel, such benefits may help foster fairer and more efficient markets for data through bringing about clearer tools and concepts that better define how data can be used in the fields of AI and ML. The paper's approach is summarized in a new family of data license language - \textit{the Montreal Data License (MDL)}. Alongside this new license, the authors and their collaborators have developed a web-based tool to generate license language espousing the taxonomies articulated in this paper.
Continual learning is the ability of an agent to learn online with a non-stationary and never-ending stream of data. A key component for suc… (voir plus)h never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The solutions developed so far often relax the problem of continual learning to the easier task-incremental setting, where the stream of data is divided into tasks with clear boundaries. In this paper, we break the limits and move to the more challenging online setting where we assume no information of tasks in the data stream. We start from the idea that each learning step should not increase the losses of the previously learned examples through constraining the optimization process. This means that the number of constraints grows linearly with the number of examples, which is a serious limitation. We develop a solution to select a fixed number of constraints that we use to approximate the feasible region defined by the original constraints. We compare our approach against the methods that rely on task boundaries to select a fixed set of examples, and show comparable or even better results, especially when the boundaries are blurry or when the data distributions are imbalanced.
Machine learning models of music typically break down the task of composition into a chronological process, composing a piece of music in a … (voir plus)single pass from beginning to end. On the contrary, human composers write music in a nonlinear fashion, scribbling motifs here and there, often revisiting choices previously made. We explore the use of blocked Gibbs sampling as an analogue to the human approach, and introduce Coconet, a convolutional neural network in the NADE family of generative models. Despite ostensibly sampling from the same distribution as the NADE ancestral sampling procedure, we find that a blocked Gibbs approach significantly improves sample quality. We provide evidence that this is due to some conditional distributions being poorly modeled. Moreover, we show that even the cheap approximate blocked Gibbs procedure from Yao et al. (2014) yields better samples than ancestral sampling. We demonstrate the versatility of our method on unconditioned polyphonic music generation.
Histological atlases of the cerebral cortex, such as those made famous by Brodmann and von Economo, are invaluable for understanding human b… (voir plus)rain microstructure and its relationship with functional organization in the brain. However, these existing atlases are limited to small numbers of manually annotated samples from a single cerebral hemisphere, measured from 2D histological sections. We present the first whole-brain quantitative 3D laminar atlas of the human cerebral cortex. This atlas was derived from a 3D histological model of the human brain at 20 micron isotropic resolution (BigBrain), using a convolutional neural network to segment, automatically, the cortical layers in both hemispheres. Our approach overcomes many of the historical challenges with measurement of histological thickness in 2D and the resultant laminar atlas provides an unprecedented level of precision and detail. We utilized this BigBrain cortical atlas to test whether previously reported thickness gradients, as measured by MRI in sensory and motor processing cortices, were present in a histological atlas of cortical thickness, and which cortical layers were contributing to these gradients. Cortical thickness increased across sensory processing hierarchies, primarily driven by layers III, V and VI. In contrast, fronto-motor cortices showed the opposite pattern, with decreases in total and pyramidal layer thickness. These findings illustrate how this laminar atlas will provide a link between single-neuron morphology, mesoscale cortical layering, macroscopic cortical thickness and, ultimately, functional neuroanatomy.
The problem of inferring unobserved values in a partially observed trajectory from a stochastic process can be considered as a structured pr… (voir plus)ediction problem. Traditionally inference is conducted using heuristic-based Monte Carlo methods. This work considers learning heuristics by leveraging a connection between policy optimization reinforcement learning and approximate inference. In particular, we learn proposal distributions used in importance samplers by casting it as a variational inference problem. We then rewrite the variational lower bound as a policy optimization problem similar to Weber et al. (2015) allowing us to transfer techniques from reinforcement learning. We apply this technique to a simple stochastic process as a proof-of-concept and show that while it is viable, it will require more engineering effort to scale inference for rare observations 1 .
In model-based reinforcement learning, the agent interleaves between model learning and planning. These two components are inextricably inte… (voir plus)rtwined. If the model is not able to provide sensible long-term prediction, the executed planner would exploit model flaws, which can yield catastrophic failures. This paper focuses on building a model that reasons about the long-term future and demonstrates how to use this for efficient planning and exploration. To this end, we build a latent-variable autoregressive model by leveraging recent ideas in variational inference. We argue that forcing latent variables to carry future information through an auxiliary task substantially improves long-term predictions. Moreover, by planning in the latent space, the planner's solution is ensured to be within regions where the model is valid. An exploration strategy can be devised by searching for unlikely trajectories under the model. Our method achieves higher reward faster compared to baselines on a variety of tasks and environments in both the imitation learning and model-based reinforcement learning settings.
The stochastic multi-armed bandit problem is a well-known model for studying the exploration-exploitation trade-off. It has significant poss… (voir plus)ible applications in adaptive clinical trials, which allow for dynamic changes in the treatment allocation probabilities of patients. However, most bandit learning algorithms are designed with the goal of minimizing the expected regret. While this approach is useful in many areas, in clinical trials, it can be sensitive to outlier data, especially when the sample size is small. In this paper, we define and study a new robustness criterion for bandit problems. Specifically, we consider optimizing a function of the distribution of returns as a regret measure. This provides practitioners more flexibility to define an appropriate regret measure. The learning algorithm we propose to solve this type of problem is a modification of the BESA algorithm [Baransi et al., 2014], which considers a more general version of regret. We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset from the clinical trial literature. Our approach compares favorably to a suite of standard bandit algorithms.