Un incubateur à temps plein de 4 mois à Mila, conçu spécifiquement pour les fondateurs et fondatrices de la deep tech issus de parcours d'élite en STIM.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Semantic Mapping for View-Invariant Relocalization.
We propose a system for visual simultaneous localization and mapping (SLAM) that combines traditional local appearance-based features with s… (voir plus)emantically meaningful object landmarks to achieve both accurate local tracking and highly view-invariant object-driven relocalization. Our mapping process uses a sampling-based approach to efficiently infer the 3D pose of object landmarks from 2D bounding box object detections. These 3D landmarks then serve as a view-invariant representation which we leverage to achieve camera relocalization even when the viewing angle changes by more than 125 degrees. This level of view-invariance cannot be attained by local appearance-based features (e.g. SIFT) since the same set of surfaces are not even visible when the viewpoint changes significantly. Our experiments show that even when existing methods fail completely for viewpoint changes of more than 70 degrees, our method continues to achieve a relocalization rate of around 90%, with a mean rotational error of around 8 degrees.
2019-05-19
2019 International Conference on Robotics and Automation (ICRA) (publié)
Despite the success of deep learning in speech recognition, multi-dialect speech recognition remains a difficult problem. Although dialect-s… (voir plus)pecific acoustic models are known to perform well in general, they are not easy to maintain when dialect-specific data is scarce and the number of dialects for each language is large. Therefore, a single unified acoustic model (AM) that generalizes well for many dialects has been in demand. In this paper, we propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM. Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously. We also propose a simple but effective training method to deal with unseen dialects. The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.
2019-05-11
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)
Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation. Ho… (voir plus)wever, the choice between character or phoneme input can create serious limitations for practical deployment, as direct control of pronunciation is crucial in certain cases. We demonstrate a simple method for combining multiple types of linguistic information in a single encoder, named representation mixing, enabling flexible choice between character, phoneme, or mixed representations during inference. Experiments and user studies on a public audiobook corpus show the efficacy of our approach.
2019-05-11
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)
Reinforcement learning allows autonomous agents to learn how to act in a stochastic, unknown environment, with which they can interact. Deep… (voir plus) reinforcement learning, in particular, has achieved great success in well-defined application domains, such as Go or chess, in which an agent has to learn how to act and there is a clear success criterion. In this talk, I will focus on the potential role of reinforcement learning as a tool for building knowledge representations in AI agents whose goal is to perform continual learning. I will examine a key concept in reinforcement learning, the value function, and discuss its generalization to support various forms of predictive knowledge. I will also discuss the role of temporally extended actions, and their associated predictive models, in learning procedural knowledge. Finally, I will discuss the challenge of how to evaluate reinforcement learning agents whose goal is not just to control their environment, but also to build knowledge about their world.
2019-05-07
International Joint Conference on Autonomous Agents and Multiagent Systems (publié)
Usability of Virtual Reality Application Through the Lens of the User Community: A Case Study
Wenting Wang
Jinghui Cheng
Jin L.C. Guo
The increasing availability and diversity of virtual reality (VR) applications highlighted the importance of their usability. Function-orien… (voir plus)ted VR applications posed new challenges that are not well studied in the literature. Moreover, user feedback becomes readily available thanks to modern software engineering tools, such as app stores and open source platforms. Using Firefox Reality as a case study, we explored the major types of VR usability issues raised in these platforms. We found that 77% of usability feedbacks can be mapped to Nielsen's heuristics while few were mappable to VR-specific heuristics. This result indicates that Nielsen's heuristics could potentially help developers address the usability of this VR application in its early development stage. This work paves the road for exploring tools leveraging the community effort to promote the usability of function-oriented VR applications.
2019-05-01
Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (publié)
We present a project that aims to generate images that depict accurate, vivid, and personalized outcomes of climate change using Cycle-Consi… (voir plus)stent Adversarial Networks (CycleGANs). By training our CycleGAN model on street-view images of houses before and after extreme weather events (e.g. floods, forest fires, etc.), we learn a mapping that can then be applied to images of locations that have not yet experienced these events. This visual transformation is paired with climate model predictions to assess likelihood and type of climate-related events in the long term (50 years) in order to bring the future closer in the viewers mind. The eventual goal of our project is to enable individuals to make more informed choices about their climate future by creating a more visceral understanding of the effects of climate change, while maintaining scientific credibility by drawing on climate model projections.
Characterization of the representations learned in intermediate layers of deep networks can provide valuable insight into the nature of a ta… (voir plus)sk and can guide the development of well-tailored learning strategies. Here we study convolutional neural network (CNN)-based acoustic models in the context of automatic speech recognition. Adapting a method proposed by [1], we measure the transferability of each layer between English, Dutch and German to assess their language-specificity. We observed three distinct regions of transferability: (1) the first two layers were entirely transferable between languages, (2) layers 2–8 were also highly transferable but we found some evidence of language specificity, (3) the subsequent fully connected layers were more language specific but could be successfully finetuned to the target language. To further probe the effect of weight freezing, we performed follow-up experiments using freeze-training [2]. Our results are consistent with the observation that CNNs converge ‘bottom up’ during training and demonstrate the benefit of freeze training, especially for transfer learning.
2019-04-30
IEEE International Conference on Acoustics, Speech, and Signal Processing (publié)
Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows … (voir plus)for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure.
Despite remarkable successes achieved by modern neural networks in a wide range of applications, these networks perform best in domain-speci… (voir plus)fic stationary environments where they are trained only once on large-scale controlled data repositories. When exposed to non-stationary learning environments, current neural networks tend to forget what they had previously learned, a phenomena known as catastrophic forgetting. Most previous approaches to this problem rely on memory replay buffers which store samples from previously learned tasks, and use them to regularize the learning on new ones. This approach suffers from the important disadvantage of not scaling well to real-life problems in which the memory requirements become enormous. We propose a memoryless method that combines standard supervised neural networks with self-organizing maps to solve the continual learning problem. The role of the self-organizing map is to adaptively cluster the inputs into appropriate task contexts - without explicit labels - and allocate network resources accordingly. Thus, it selectively routes the inputs in accord with previous experience, ensuring that past learning is maintained and does not interfere with current learning. Out method is intuitive, memoryless, and performs on par with current state-of-the-art approaches on standard benchmarks.
The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, … (voir plus)for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility. The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.
2019-04-16
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)
With too few samples or too many model parameters, overfitting can inhibit the ability to generalise predictions to new data. Within medical… (voir plus) imaging, this can occur when features are incorrectly assigned importance such as distinct hospital specific artifacts, leading to poor performance on a new dataset from a different institution without those features, which is undesirable. Most regularization methods do not explicitly penalize the incorrect association of these features to the target class and hence fail to address this issue. We propose a regularization method, GradMask, which penalizes saliency maps inferred from the classifier gradients when they are not consistent with the lesion segmentation. This prevents non-tumor related features to contribute to the classification of unhealthy samples. We demonstrate that this method can improve test accuracy between 1-3% compared to the baseline without GradMask, showing that it has an impact on reducing overfitting.