Ce programme soutient les startups spécialisées en IA à tout moment de l'année. Bénéficiez de ressources de pointe et d'un accompagnement sur mesure pour accélérer le développement de votre technologie.
Développez des compétences fondamentales en intelligence artificielle (IA) responsable grâce à des cours autodirigés, animés par des expert·e·s de Mila reconnu·e·s à l’échelle internationale.
Le Fellowship Mila en politiques de l'IA transforme l'expertise approfondie en IA en politiques rigoureuses d'intérêt public. Découvrez la dernière publication Combler la disparité en matière d’expertise : mécanismes de transfert des connaissances pour la réglementation de l’IA par Moritz von Knebel.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Speech and Speaker Recognition from Raw Waveform with SincNet
Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent tre… (voir plus)nd in speech and speaker recognition consists in discovering these representations starting from raw audio samples directly. Differently from standard hand-crafted features such as MFCCs or FBANK, the raw waveform can potentially help neural networks discover better and more customized representations. The high-dimensional raw inputs, however, can make training significantly more challenging. This paper summarizes our recent efforts to develop a neural architecture that efficiently processes speech from audio waveforms. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized front-end, that only depends on some parameters with a clear physical meaning. Our experiments, conducted on both speaker and speech recognition, show that the proposed architecture converges faster, performs better, and is more computationally efficient than standard CNNs.
The capacity of meta-learning algorithms to quickly adapt to a variety of tasks, including ones they did not experience during meta-training… (voir plus), has been a key factor in the recent success of these methods on few-shot learning problems. This particular advantage of using meta-learning over standard supervised or reinforcement learning is only well founded under the assumption that the adaptation phase does improve the performance of our model on the task of interest. However, in the classical framework of meta-learning, this constraint is only mildly enforced, if not at all, and we only see an improvement on average over a distribution of tasks. In this paper, we show that the adaptation in an algorithm like MAML can significantly decrease the performance of an agent in a meta-reinforcement learning setting, even on a range of meta-training tasks.
Recurrent transition networks for character locomotion
Félix Harvey
Christopher Pal
We present a novel approach, based on deep recurrent neural networks, to automatically generate transition animations given a past context o… (voir plus)f a few frames, a target character state and optionally local terrain information. The proposed Recurrent Transition Network (RTN) is trained without any gait, phase, contact or action labels. Our system produces realistic and fluid transitions that rival the quality of Motion Capture-based animations, even without any inverse-kinematics post-process. Our system could accelerate the creation of transition variations for large coverage or even replace transition nodes in a game's animation graph. The RTN also shows impressive results on a temporal super-resolution task.
This paper presents a new method for learning typed entailment graphs from text. We extract predicate-argument structures from multiple-sour… (voir plus)ce news corpora, and compute local distributional similarity scores to learn entailments between predicates with typed arguments (e.g., person contracted disease). Previous work has used transitivity constraints to improve local decisions, but these constraints are intractable on large graphs. We instead propose a scalable method that learns globally consistent similarity scores based on new soft constraints that consider both the structures across typed entailment graphs and inside each graph. Learning takes only a few hours to run over 100K predicates and our results show large improvements over local similarity scores on two entailment data sets. We further show improvements over paraphrases and entailments from the Paraphrase Database, and prior state-of-the-art entailment graphs. We show that the entailment graphs improve performance in a downstream task.
2018-11-30
Transactions of the Association for Computational Linguistics (publié)
Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{explorat… (voir plus)ion} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.
To achieve general artificial intelligence, reinforcement learning (RL) agents should learn not only to optimize returns for one specific ta… (voir plus)sk but also to constantly build more complex skills and scaffold their knowledge about the world, without forgetting what has already been learned. In this paper, we discuss the desired characteristics of environments that can support the training and evaluation of lifelong reinforcement learning agents, review existing environments from this perspective, and propose recommendations for devising suitable environments in the future.
We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different ta… (voir plus)sks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks and propose a general \textbf{graph multi-task learning} framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labeling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines but also learn interpretable and transferable patterns across tasks.
We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oo… (voir plus)rd et al., 2017b) for forward planning with MCTS. In order to test this method, we introduce a new environment featuring varying difficulty levels, along with moving goals and obstacles. The combination of high-quality frame generation and classical planning approaches nearly matches true environment performance for our task, demonstrating the usefulness of this method for model-based planning in dynamic environments.
The number of visually impaired or blind (VIB) people in the world is estimated at several hundred million. Based on a series of interviews … (voir plus)with the VIB and developers of assistive technology, this paper provides a survey of machine-learning based mobile applications and identifies the most relevant applications. We discuss the functionality of these apps, how they align with the needs and requirements of the VIB users, and how they can be improved with techniques such as federated learning and model compression. As a result of this study we identify promising future directions of research in mobile perception, micro-navigation, and content-summarization.
Conditional text-to-image generation approaches commonly focus on generating a single image in a single step. One practical extension beyond… (voir plus) one-step generation is an interactive system that generates an image iteratively, conditioned on ongoing linguistic input / feedback. This is significantly more challenging as such a system must understand and keep track of the ongoing context and history. In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation. We show that our model is able to generate the background, add new objects, apply simple transformations to existing objects, and correct previous mistakes. We believe our approach is an important step toward interactive generation.