Mirco Ravanelli

Associate Academic Member

Assistant Professor, Concordia University, Gina Cody School of Engineering and Computer Science

Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research

Biography

Mirco Ravanelli is an assistant professor at Concordia University, adjunct professor at Université de Montréal and associate member of Mila – Quebec Artificial Intelligence Institute.

Ravanelli is an expert in deep learning and conversational AI, publishing over sixty papers in these fields. His contributions were honoured with a 2022 Amazon Research Award.

His research focuses primarily on novel deep learning algorithms, including self-supervised, continual, multimodal, cooperative and energy-efficient learning.

Formerly a postdoctoral fellow at Mila under Yoshua Bengio, he founded and now leads SpeechBrain, one of the most extensively used open-source toolkits in the field of speech processing and conversational AI.

Current Students

Arian Morteza

PhD - Concordia University

arian.morteza@mila.quebec

PhD - Université de Montréal

artem.ploujnikov@mila.quebec

Cordelle Briac

Collaborating researcher - Concordia University University

briac.cordelle@mila.quebec

Ritika Dhamija Ritika

Collaborating researcher - Concordia University University

dhamija.ritika@mila.quebec

Github

Eleonora Mancini

Research Intern - Université de Montréal

Principal supervisor :

Cem (Yusuf) Subakan

eleonora.mancini@mila.quebec

Github

Google Scholar

Fırat Öncel

PhD - Concordia University

Co-supervisor :

Laurent Charlin

firat.oncel@mila.quebec

Master's Research - Concordia University

gianfranco.bertucci@mila.quebec

Website

Github

Hiba Akhaddar

Master's Research - Concordia University

hiba.akhaddar@mila.quebec

Jama Mohamud

PhD - Université de Montréal

Co-supervisor :

Yoshua Bengio

hussein-mohamu.jama@mila.quebec

PhD - Concordia University

Co-supervisor :

Cem (Yusuf) Subakan

luca.dellalibera@mila.quebec

Github

Pooneh Mousavi

PhD - Concordia University

pooneh.mousavi@mila.quebec

Website

Github

Google Scholar

Salman Sami Hussain Ali

Collaborating researcher - Concordia University University

salman.hussainali@mila.quebec

Github

Seina Assadian

Collaborating researcher - Concordia University University

seina.assadian@mila.quebec

Github

Tristan Lueger Lueger

Collaborating researcher - Concordia University University

tristan.lueger@mila.quebec

Victor Cruz

Master's Research - Concordia University

victor.cruz@mila.quebec

Wagner Drew

Undergraduate - Concordia University

drew.wagner@mila.quebec

Github

Blog Posts

June 13, 2024

SpeechBrain 1.0: Making Conversational AI Accessible to Everyone

Mirco Ravanelli

Read the article

April 28, 2021

Introducing SpeechBrain: A General-Purpose PyTorch Speech Processing Toolkit

Mirco Ravanelli

Loren Lugosch

Read the article

Publications

Learning Representations for New Sound Classes With Continual Self-Supervised Learning

Zhepei Wang

Cem (Yusuf) Subakan

Xilin Jiang

Junkai Wu

Efthymios Tzinis

Mirco Ravanelli

Paris Smaragdis

In this article, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framew… (see more)ork where the model can be updated without relying on labeled data. For this purpose, we propose adopting representation learning, where an encoder is trained using unlabeled data. This learning framework enables the study and implementation of a practically relevant use case where only a small amount of the labels is available in a continual learning context. We also make the empirical observation that a similarity-based representation learning method within this framework is robust to forgetting even if no explicit mechanism against forgetting is employed. We show that this approach obtains similar performance compared to several distillation-based continual learning methods when employed on self-supervised representation learning methods.

2022-01-01

IEEE Signal Processing Letters (published)

doi.org

arxiv.org

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

Cheng Yu

Szu‐wei Fu

Tsun-An Hsieh

Yu-shan Tsao

Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE s… (see more)ystem to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OSSEM consists of a modified transformer SE network and a speaker-specific masking (SSM) network. In practice, the SSM network takes an enrolled speaker embedding extracted using ECAPA-TDNN to adjust the input noisy feature through masking. To evaluate OSSEM, we designed a modified Voice Bank-DEMAND dataset, in which one utterance from the testing set was used for model adaptation, and the remaining utterances were used for testing the performance. Moreover, we set restrictions allowing the enhancement process to be conducted in real time, and thus designed OSSEM to be a causal SE system. Experimental results first show that OSSEM can effectively adapt a pretrained SE model to a particular speaker with only one utterance, thus yielding improved SE results. Meanwhile, OSSEM exhibits a competitive performance compared to state-of-the-art causal SE systems.

2021-11-10

ArXiv (preprint)

doi.org

arxiv.org

Real-M: Towards Speech Separation on Real Mixtures

Cem (Yusuf) Subakan

Mirco Ravanelli

Samuele Cornell

François Grondin

In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation mod… (see more)els on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures, i.e. we observe that the performance predictions of the SI-SNR estimator correlate well with human opinions. Moreover, when evaluating popular speech separation models, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow the performance trends achieved on synthetic benchmarks.

2021-10-20

ArXiv (preprint)

doi.org

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources