Découvrez le dernier rapport d'impact de Mila, qui met en lumière les réalisations exceptionnelles des membres de notre communauté au cours de la dernière année.
Rapport et guide politique GPAI: Vers une réelle égalité en IA
Rejoignez-nous à Mila le 26 novembre pour le lancement du rapport et du guide politique qui présente des recommandations concrètes pour construire des écosystèmes d'IA inclusifs.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Self-supervised multimodal learning for group inferences from MRI data: Discovering disorder-relevant brain regions and multimodal links
Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However, emotions conveyed through speech should be consider… (voir plus)ed as discrete speech events with definite temporal boundaries, rather than attributes of the entire utterance. To reflect the fine-grained nature of speech emotions and to unify various fine-grained methods under a single objective, we propose a new task: Speech Emotion Diarization (SED). Just as Speaker Diarization answers the question of “Who speaks when?”, Speech Emotion Diarization answers the question of “Which emotion appears when?”. To facilitate the evaluation of the performance and establish a common benchmark, we introduce the Zaion Emotion Dataset (ZED), an openly accessible speech emotion dataset that includes non-acted emotions recorded in real-life conditions, along with manually annotated boundaries of emotion segments within the utterance. We provide competitive baselines and open-source the code and the pre-trained models.
2023-12-16
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (publié)
TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of au… (voir plus)dio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio’s development principles and contents and highlight key features we include in its latest version (2.1): self-supervised learning pre-trained pipelines and training recipes, high-performance CTC decoders, speech recognition models and training recipes, advanced media I/O capabilities, and tools for performing forced alignment, multi-channel speech enhancement, and reference-less speech assessment. For a selection of these features, through empirical studies, we demonstrate their efficacy and show that they achieve competitive or state-of-the-art performance.
2023-12-16
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (publié)
While signed distance fields (SDFs) in theory offer infinite level of detail, they are typically rendered using the sphere tracing algorithm… (voir plus) at finite resolutions, which causes the common rasterized image synthesis problem of aliasing. Most existing optimized antialiasing solutions rely on polygon mesh representations; SDF-based geometry can only be directly antialiased with the computationally expensive supersampling or with post-processing filters that may produce undesirable blurriness and ghosting. In this work, we present cone-traced supersampling (CTSS), an efficient and robust spatial antialiasing solution that naturally complements the sphere tracing algorithm, does not require casting additional rays per pixel or offline prefiltering, and can be easily implemented in existing real-time SDF renderers. CTSS performs supersampling along the traced ray near surfaces with partial visibility – object contours – identified by evaluating cone intersections within a pixel's view frustum. We further introduce subpixel edge reconstruction (SER), a technique that extends CTSS to locate and resolve complex pixels with geometric edges in relatively flat regions, which are otherwise undetected by cone intersections. Our combined solution relies on a specialized sampling strategy to minimize the number of shading computations and correlates sample visibility to aggregate the samples. With comparable antialiasing quality at significantly lower computational cost, CTSS is a reliable practical alternative to conventional supersampling.
2023-12-14
IEEE Transactions on Visualization and Computer Graphics (publié)
Reinforcement learning (RL) for partially observable Markov decision processes (POMDPs) is a challenging problem because decisions need to b… (voir plus)e made based on the entire history of observations and actions. However, in several scenarios, state information is available during the training phase. We are interested in exploiting the availability of this state information during the training phase to efficiently learn a history-based policy using RL. Specifically, we consider actor-critic algorithms, where the actor uses only the history information but the critic uses both history and state. Such algorithms are called asymmetric actor-critic, to highlight the fact that the actor and critic have asymmetric information. Motivated by the recent success of using representation losses in RL for POMDPs [1], we derive similar theoretical results for the asymmetric actor-critic case and evaluate the effectiveness of adding such auxiliary losses in experiments. In particular, we learn a history representation-called an approximate information state (AIS)-and bound the performance loss when acting using AIS.
Current Practices in Voice Data Collection and Limitations to Voice AI Research: A National Survey.
Emily Evangelista
Rohan Kale
Desiree McCutcheon
Anais Rameau
Alexander Gelbard
Maria Powell
Michael Johns
Anthony Law
Phillip Song
Matthew Naunheim
Stephanie Watts
Paul C. Bryson
Matthew G. Crowson
Jeremy Pinto
Yael Bensoussan
INTRODUCTION
Accuracy and validity of voice AI algorithms rely on substantial quality voice data. Although commensurable amounts of voice da… (voir plus)ta are captured daily in voice centers across North America, there is no standardized protocol for acoustic data management, which limits the usability of these datasets for voice artificial intelligence (AI) research.
OBJECTIVE
The aim was to capture current practices of voice data collection, storage, analysis, and perceived limitations to collaborative voice research.
METHODS
A 30-question online survey was developed with expert guidance from the voicecollab.ai members, an international collaborative of voice AI researchers. The survey was disseminated via REDCap to an estimated 200 practitioners at North American voice centers. Survey questions assessed respondents' current practices in terms of acoustic data collection, storage, and retrieval as well as limitations to collaborative voice research.
RESULTS
Seventy-two respondents completed the survey of which 81.7% were laryngologists and 18.3% were speech language pathologists (SLPs). Eighteen percent of respondents reported seeing 40%-60% and 55% reported seeing >60 patients with voice disorders weekly (conservative estimate of over 4000 patients/week). Only 28% of respondents reported utilizing standardized protocols for collection and storage of acoustic data. Although, 87% of respondents conduct voice research, only 38% of respondents report doing so on a multi-institutional level. Perceived limitations to conducting collaborative voice research include lack of standardized methodology for collection (30%) and lack of human resources to prepare and label voice data adequately (55%).
CONCLUSION
To conduct large-scale multi-institutional voice research with AI, there is a pertinent need for standardization of acoustic data management, as well as an infrastructure for secure and efficient data sharing.
LEVEL OF EVIDENCE
Level 5 Laryngoscope, 2023.
Current Practices in Voice Data Collection and Limitations to Voice AI Research: A National Survey.
Emily Evangelista
Rohan Kale
Desiree McCutcheon
Anais Rameau
Alexander H. Gelbard
Maria Powell
Michael Johns
Anthony Law
Phillip C Song
M. Naunheim
Stephanie Watts
Paul C. Bryson
Matthew G. Crowson
Jeremy M. Pinto
Yael Bensoussan
INTRODUCTION
Accuracy and validity of voice AI algorithms rely on substantial quality voice data. Although commensurable amounts of voice da… (voir plus)ta are captured daily in voice centers across North America, there is no standardized protocol for acoustic data management, which limits the usability of these datasets for voice artificial intelligence (AI) research.
OBJECTIVE
The aim was to capture current practices of voice data collection, storage, analysis, and perceived limitations to collaborative voice research.
METHODS
A 30-question online survey was developed with expert guidance from the voicecollab.ai members, an international collaborative of voice AI researchers. The survey was disseminated via REDCap to an estimated 200 practitioners at North American voice centers. Survey questions assessed respondents' current practices in terms of acoustic data collection, storage, and retrieval as well as limitations to collaborative voice research.
RESULTS
Seventy-two respondents completed the survey of which 81.7% were laryngologists and 18.3% were speech language pathologists (SLPs). Eighteen percent of respondents reported seeing 40%-60% and 55% reported seeing >60 patients with voice disorders weekly (conservative estimate of over 4000 patients/week). Only 28% of respondents reported utilizing standardized protocols for collection and storage of acoustic data. Although, 87% of respondents conduct voice research, only 38% of respondents report doing so on a multi-institutional level. Perceived limitations to conducting collaborative voice research include lack of standardized methodology for collection (30%) and lack of human resources to prepare and label voice data adequately (55%).
CONCLUSION
To conduct large-scale multi-institutional voice research with AI, there is a pertinent need for standardization of acoustic data management, as well as an infrastructure for secure and efficient data sharing.
LEVEL OF EVIDENCE
Level 5 Laryngoscope, 2023.