Mirco Ravanelli

Membre académique associé

Professeur adjoint, Concordia University, École de génie et d'informatique Gina-Cody

Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Biographie

Mirco Ravanelli est professeur adjoint à l'Université Concordia, professeur associé à l'Université de Montréal et membre associé de Mila – Institut québécois d’intelligence artificielle. Lauréat du prix Amazon Research 2022, il est expert en apprentissage profond et en IA conversationnelle, et a publié plus de 60 articles dans ces domaines. Il se concentre principalement sur les nouveaux algorithmes d'apprentissage profond, y compris l'apprentissage autosupervisé, continu, multimodal, coopératif et économe en énergie. Mirco Ravanelli a effectué son postdoctorat à Mila, sous la direction du professeur Yoshua Bengio. Il est notamment le fondateur et le chef de file de SpeechBrain, l'une des boîtes à outils en code source ouvert les plus largement adoptées dans le domaine du traitement de la parole et de l'IA conversationnelle.

Étudiants actuels

Arian Morteza

Doctorat - Concordia University

arian.morteza@mila.quebec

Doctorat - Université de Montréal

artem.ploujnikov@mila.quebec

Cordelle Briac

Collaborateur·rice de recherche - Concordia University University

briac.cordelle@mila.quebec

Ritika Dhamija Ritika

Collaborateur·rice de recherche - Concordia University University

dhamija.ritika@mila.quebec

Github

Eleonora Mancini

Stagiaire de recherche - Université de Montréal

Superviseur⋅e principal⋅e :

Cem (Yusuf) Subakan

eleonora.mancini@mila.quebec

Github

Google Scholar

Fırat Öncel

Doctorat - Concordia University

Co-superviseur⋅e :

Laurent Charlin

firat.oncel@mila.quebec

Maîtrise recherche - Concordia University

gianfranco.bertucci@mila.quebec

Site web

Github

Hiba Akhaddar

Maîtrise recherche - Concordia University

hiba.akhaddar@mila.quebec

Jama Mohamud

Doctorat - Université de Montréal

Co-superviseur⋅e :

Yoshua Bengio

hussein-mohamu.jama@mila.quebec

Doctorat - Concordia University

Co-superviseur⋅e :

Cem (Yusuf) Subakan

luca.dellalibera@mila.quebec

Github

Pooneh Mousavi

Doctorat - Concordia University

pooneh.mousavi@mila.quebec

Site web

Github

Google Scholar

Salman Sami Hussain Ali

Collaborateur·rice de recherche - Concordia University University

salman.hussainali@mila.quebec

Github

Seina Assadian

Collaborateur·rice de recherche - Concordia University University

seina.assadian@mila.quebec

Github

Tristan Lueger Lueger

Collaborateur·rice de recherche - Concordia University University

tristan.lueger@mila.quebec

Victor Cruz

Maîtrise recherche - Concordia University

victor.cruz@mila.quebec

Wagner Drew

Baccalauréat - Concordia University

drew.wagner@mila.quebec

Github

Billets de blogue

13 juin 2024

SpeechBrain 1.0 : rendre l’IA conversationnelle accessible à tout le monde

par

Mirco Ravanelli

Lire l'article

Introducing SpeechBrain: A general-purpose PyTorch speech processing toolkit

28 avril 2021

Voici SpeechBrain : Une boîte à outils polyvalente de traitement de la parole basée sur PyTorch

par

Mirco Ravanelli

Loren Lugosch

Lire l'article

Publications

Learning Representations for New Sound Classes With Continual Self-Supervised Learning

Zhepei Wang

Cem (Yusuf) Subakan

Xilin Jiang

Junkai Wu

Efthymios Tzinis

Mirco Ravanelli

Paris Smaragdis

In this article, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framew… (voir plus)ork where the model can be updated without relying on labeled data. For this purpose, we propose adopting representation learning, where an encoder is trained using unlabeled data. This learning framework enables the study and implementation of a practically relevant use case where only a small amount of the labels is available in a continual learning context. We also make the empirical observation that a similarity-based representation learning method within this framework is robust to forgetting even if no explicit mechanism against forgetting is employed. We show that this approach obtains similar performance compared to several distillation-based continual learning methods when employed on self-supervised representation learning methods.

2022-01-01

IEEE Signal Processing Letters (publié)

doi.org

arxiv.org

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

Cheng Yu

Szu‐wei Fu

Tsun-An Hsieh

Yu-shan Tsao

Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE s… (voir plus)ystem to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OSSEM consists of a modified transformer SE network and a speaker-specific masking (SSM) network. In practice, the SSM network takes an enrolled speaker embedding extracted using ECAPA-TDNN to adjust the input noisy feature through masking. To evaluate OSSEM, we designed a modified Voice Bank-DEMAND dataset, in which one utterance from the testing set was used for model adaptation, and the remaining utterances were used for testing the performance. Moreover, we set restrictions allowing the enhancement process to be conducted in real time, and thus designed OSSEM to be a causal SE system. Experimental results first show that OSSEM can effectively adapt a pretrained SE model to a particular speaker with only one utterance, thus yielding improved SE results. Meanwhile, OSSEM exhibits a competitive performance compared to state-of-the-art causal SE systems.

2021-11-10

ArXiv (preprint)

doi.org

arxiv.org

Real-M: Towards Speech Separation on Real Mixtures

Cem (Yusuf) Subakan

Mirco Ravanelli

Samuele Cornell

François Grondin

In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation mod… (voir plus)els on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures, i.e. we observe that the performance predictions of the SI-SNR estimator correlate well with human opinions. Moreover, when evaluating popular speech separation models, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow the performance trends achieved on synthetic benchmarks.

2021-10-20

ArXiv (preprint)

doi.org

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources