Développez des compétences fondamentales en intelligence artificielle (IA) responsable grâce à des cours autodirigés, animés par des expert·e·s de Mila reconnu·e·s à l’échelle internationale.
Le Fellowship Mila en politiques de l'IA transforme l'expertise approfondie en IA en politiques rigoureuses d'intérêt public. Découvrez la dernière publication Combler la disparité en matière d’expertise : mécanismes de transfert des connaissances pour la réglementation de l’IA par Moritz von Knebel.
Ce programme soutient les startups spécialisées en IA à tout moment de l'année. Bénéficiez de ressources de pointe et d'un accompagnement sur mesure pour accélérer le développement de votre technologie.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Multiresolution Recurrent Neural Networks: An Application to Dialogue\n Response Generation
We introduce the multiresolution recurrent neural network, which extends the\nsequence-to-sequence framework to model natural language gener… (voir plus)ation as two\nparallel discrete stochastic processes: a sequence of high-level coarse tokens,\nand a sequence of natural language tokens. There are many ways to estimate or\nlearn the high-level coarse tokens, but we argue that a simple extraction\nprocedure is sufficient to capture a wealth of high-level discourse semantics.\nSuch procedure allows training the multiresolution recurrent neural network by\nmaximizing the exact joint log-likelihood over both sequences. In contrast to\nthe standard log- likelihood objective w.r.t. natural language tokens (word\nperplexity), optimizing the joint log-likelihood biases the model towards\nmodeling high-level abstractions. We apply the proposed model to the task of\ndialogue response generation in two challenging domains: the Ubuntu technical\nsupport domain, and Twitter conversations. On Ubuntu, the model outperforms\ncompeting approaches by a substantial margin, achieving state-of-the-art\nresults according to both automatic evaluation metrics and a human evaluation\nstudy. On Twitter, the model appears to generate more relevant and on-topic\nresponses according to automatic evaluation metrics. Finally, our experiments\ndemonstrate that the proposed model is more adept at overcoming the sparsity of\nnatural language and is better able to capture long-term structure.\n
2017-02-11
AAAI Conference on Artificial Intelligence (publié)
Long-term automated monitoring of residential or small in- dustrial properties is an important task within the broader scope of human activi… (voir plus)ty recognition. We present a device- free wifi-based localization system for smart indoor spaces, developed in a collaboration between McGill University and Aerˆıal Technologies. The system relies on existing wifi net- work signals and semi-supervised learning, in order to au- tomatically detect entrance into a residential unit, and track the location of a moving subject within the sensing area. The implemented real-time monitoring platform works by detect- ing changes in the characteristics of the wifi signals collected via existing off-the-shelf wifi-enabled devices in the environ- ment. This platform has been deployed in several apartments in the Montreal area, and the results obtained show the poten- tial of this technology to turn any regular home with an ex- isting wifi network into a smart home equipped with intruder alarm and room-level location detector. The machine learn- ing component has been devised so as to minimize the need for user annotation and overcome temporal instabilities in the input signals. We use a semi-supervised learning framework which works in two phases. First, we build a base learner for mapping wifi signals to different physical locations in the en- vironment from a small amount of labeled data; during its lifetime, the learner automatically re-trains when the uncer- tainty level rises significantly, without the need for further supervision. This paper describes the technical and practical issues arising in the design and implementation of such a sys- tem for real residential units, and illustrates its performance during on-going deployment.
2017-02-10
AAAI Conference on Artificial Intelligence (publié)
We introduce the adversarially learned inference (ALI) model, which jointly learns a generation network and an inference network using an ad… (voir plus)versarial process. The generation network maps samples from stochastic latent variables to the data space while the inference network maps training examples in data space to the space of latent variables. An adversarial game is cast between these two networks and a discriminative network is trained to distinguish between joint latent/data-space samples from the generative network and joint samples from the inference network. We illustrate the ability of the model to learn mutually coherent inference and generation networks through the inspections of model samples and reconstructions and confirm the usefulness of the learned representations by obtaining a performance competitive with state-of-the-art on the semi-supervised SVHN and CIFAR10 tasks.
2017-02-05
International Conference on Learning Representations (poster)
In this paper, we propose to equip Generative Adversarial Networks with the ability to produce direct energy estimates for samples.
Specific… (voir plus)ally, we propose a flexible adversarial training framework, and prove this framework not only ensures the generator converges to the true data distribution, but also enables the discriminator to retain the density information at the global optimal.
We derive the analytic form of the induced solution, and analyze the properties.
In order to make the proposed framework trainable in practice, we introduce two effective approximation techniques.
Empirically, the experiment results closely match our theoretical analysis, verifying the discriminator is able to recover the energy of data distribution.
In this paper we propose a novel model for unconditional audio generation task that generates one audio sample at a time. We show that our m… (voir plus)odel which profits from combining memory-less modules, namely autoregressive multilayer perceptron, and stateful recurrent neural networks in a hierarchical structure is de facto powerful to capture the underlying sources of variations in temporal domain for very long time on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.
In this paper, we construct and train end-to-end neural network-based dialogue systems usingan updated version of the recent Ubuntu Dialogue… (voir plus) Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu Dialogue Corpus, and for end-to-end dialogue systems in general.
We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Curren… (voir plus)t log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.
The standard approach to supervised classification involves the minimization of a log-loss as an upper bound to the classification error. Wh… (voir plus)ile this is a tight bound early on in the optimization, it overemphasizes the influence of incorrectly classified examples far from the decision boundary. Updating the upper bound during the optimization leads to improved classification rates while transforming the learning into a sequence of minimization problems. In addition, in the context where the classifier is part of a larger system, this modification makes it possible to link the performance of the classifier to that of the whole system, allowing the seamless introduction of external constraints.
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (voir plus) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.