Mirco Ravanelli

Membre académique associé

Professeur adjoint, Concordia University, École de génie et d'informatique Gina-Cody

Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Sujets de recherche

Apprentissage profond

Site web

Google Scholar

Biographie

Mirco Ravanelli est professeur adjoint à l'Université Concordia, professeur associé à l'Université de Montréal et membre associé de Mila – Institut québécois d’intelligence artificielle. Lauréat du prix Amazon Research 2022, il est expert en apprentissage profond et en IA conversationnelle, et a publié plus de 60 articles dans ces domaines. Il se concentre principalement sur les nouveaux algorithmes d'apprentissage profond, y compris l'apprentissage autosupervisé, continu, multimodal, coopératif et économe en énergie. Mirco Ravanelli a effectué son postdoctorat à Mila, sous la direction du professeur Yoshua Bengio. Il est notamment le fondateur et le chef de file de SpeechBrain, l'une des boîtes à outils en code source ouvert les plus largement adoptées dans le domaine du traitement de la parole et de l'IA conversationnelle.

Étudiants actuels

Hiba Akhaddar

Maîtrise recherche - Concordia

Seina Assadian

Collaborateur·rice de recherche - Concordia University

Github

Cordelle Briac

Collaborateur·rice de recherche - Concordia University

Victor Cruz

Maîtrise recherche - Concordia

Luca Della Libera

Doctorat - Concordia

Co-superviseur⋅e :

Cem (Yusuf) Subakan

Github

Wagner Drew

Baccalauréat - Concordia

Github

Gianfranco Dumoulin Bertucci

Maîtrise recherche - Concordia

Site web

Github

Maab Elrashid Ahmed Mohamed Elrashid Ahmed Mohamed

Doctorat - Concordia

Co-superviseur⋅e :

Doctorat - Concordia

Salman Sami Hussain Ali

Collaborateur·rice de recherche - Concordia University

Github

Tristan Lueger

Collaborateur·rice de recherche - Concordia University

Eleonora Mancini

Stagiaire de recherche - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - Concordia

Doctorat - Concordia

Co-superviseur⋅e :

Peter Peter

Postdoctorat - McGill

Doctorat - UdeM

Ritika Dhamija Ritika

Collaborateur·rice de recherche - Concordia University

Github

Billets de blogue

13 juin 2024

SpeechBrain 1.0 : rendre l’IA conversationnelle accessible à tout le monde

par

Mirco Ravanelli

Lire l'article

Introducing SpeechBrain: A general-purpose PyTorch speech processing toolkit

28 avril 2021

Voici SpeechBrain : Une boîte à outils polyvalente de traitement de la parole basée sur PyTorch

par

Mirco Ravanelli

Loren Lugosch

Lire l'article

Publications

SpeechBrain-MOABB: An open-source Python library for benchmarking deep neural networks applied to EEG signals

Davide Borra

Francesco Paissan

Mirco Ravanelli

2024-11-01

Computers in Biology and Medicine (publié)

doi.org

What Are They Doing? Joint Audio-Speech Co-Reasoning

Yingzhi Wang

Pooneh Mousavi

Artem Ploujnikov

Mirco Ravanelli

2024-09-22

ArXiv (prépublication)

doi.org

arxiv.org

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

Shubham Gupta

Isaac Neri Gomez-Sarmiento

Faez Amjed Mezdari

Mirco Ravanelli

Cem (Yusuf) Subakan

2024-09-19

Lecture Notes in Computer Science (publié)

doi.org

arxiv.org

Explaining Network Decision Provides Insights on the Causal Interaction Between Brain Regions in a Motor Imagery Task

Davide Borra

Mirco Ravanelli

2024-09-19

Lecture Notes in Computer Science (publié)

doi.org

Multi-modal Decoding of Reach-to-Grasping from EEG and EMG via Neural Networks

Davide Borra

Matteo Fraternali

Mirco Ravanelli

Elisa Magosso

2024-09-19

Lecture Notes in Computer Science (publié)

doi.org

LMAC-TD: Producing Time Domain Explanations for Audio Classifiers

Eleonora Mancini

Francesco Paissan

Mirco Ravanelli

Cem (Yusuf) Subakan

2024-09-13

ArXiv (prépublication)

doi.org

arxiv.org

Audio Editing with Non-Rigid Text Prompts

Francesco Paissan

Zhepei Wang

Mirco Ravanelli

Paris Smaragdis

Cem (Yusuf) Subakan

In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits … (voir plus)that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.

2024-09-01

Interspeech 2024 (publié)

doi.org

arxiv.org

ProGRes: Prompted Generative Rescoring on ASR n-Best

Ada Defne Tur

Adel Moumen

Mirco Ravanelli

2024-08-30

ArXiv (prépublication)

doi.org

arxiv.org

Listenable Maps for Audio Classifiers

Francesco Paissan

Mirco Ravanelli

Cem (Yusuf) Subakan

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

openreview.net

Open-Source Conversational AI with SpeechBrain 1.0

Mirco Ravanelli

Titouan Parcollet

Adel Moumen

Sylvain de Langen

Cem (Yusuf) Subakan

Peter William VanHarn Plantinga

Yingzhi Wang

Pooneh Mousavi

Luca Della Libera

Artem Ploujnikov

Francesco Paissan

Davide Borra

Salah Zaiem

Zeyu Zhao

Shucong Zhang

Georgios Karakasidis

Sung-Lin Yeh

Pierre Champion

Aku Rouhe

Rudolf Braun … (voir 11 de plus)

Florian Mai

Juan Pablo Zuluaga

Seyed Mahed Mousavi

Andreas Nautsch

Xuechen Liu

Sangeet Sagar

Jarod Duret

Salima Mdhaffar

G. Laperriere

Renato de Mori

Yannick Estève

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech rec… (voir plus)ognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete"recipes"of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks

2024-06-29

ArXiv (prépublication)

doi.org

arxiv.org

DASB -- Discrete Audio and Speech Benchmark

Pooneh Mousavi

Luca Della Libera

Jarod Duret

Artem Ploujnikov

Cem (Yusuf) Subakan

Mirco Ravanelli

Discrete audio tokens have recently gained considerable attention for their potential to connect audio and language processing, enabling the… (voir plus) creation of modern multimodal large language models. Ideal audio tokens must effectively preserve phonetic and semantic content along with paralinguistic information, speaker identity, and other details. While several types of audio tokens have been recently proposed, identifying the optimal tokenizer for various tasks is challenging due to the inconsistent evaluation settings in existing studies. To address this gap, we release the Discrete Audio and Speech Benchmark (DASB), a comprehensive leaderboard for benchmarking discrete audio tokens across a wide range of discriminative tasks, including speech recognition, speaker identification and verification, emotion recognition, keyword spotting, and intent classification, as well as generative tasks such as speech enhancement, separation, and text-to-speech. Our results show that, on average, semantic tokens outperform compression tokens across most discriminative and generative tasks. However, the performance gap between semantic tokens and standard continuous representations remains substantial, highlighting the need for further research in this field.

2024-06-20

ArXiv (prépublication)

doi.org

arxiv.org

How Should We Extract Discrete Audio Tokens from Self-Supervised Models?

Pooneh Mousavi

Jarod Duret

Salah Zaiem

Luca Della Libera

Artem Ploujnikov

Cem (Yusuf) Subakan

Mirco Ravanelli

2024-06-15

ArXiv (prépublication)

doi.org

arxiv.org

Peu importe la taille : démocratiser la découverte de protéines avec l'IA

Boussole des politiques en IA

Demandes de supervision

Mirco Ravanelli

Biographie

Étudiants actuels

Billets de blogue

Publications

Peu importe la taille : démocratiser la découverte de protéines avec l'IA

Boussole des politiques en IA

Demandes de supervision

Mots-clés populaires:

Mirco Ravanelli

Biographie

Étudiants actuels

Billets de blogue

Publications