Portrait of Mirco Ravanelli

Mirco Ravanelli

Associate Academic Member
Assistant Professor, Concordia University, Gina Cody School of Engineering and Computer Science
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning

Biography

Mirco Ravanelli is an assistant professor at Concordia University, adjunct professor at Université de Montréal and associate member of Mila – Quebec Artificial Intelligence Institute.

Ravanelli is an expert in deep learning and conversational AI, publishing over sixty papers in these fields. His contributions were honoured with a 2022 Amazon Research Award.

His research focuses primarily on novel deep learning algorithms, including self-supervised, continual, multimodal, cooperative and energy-efficient learning.

Formerly a postdoctoral fellow at Mila under Yoshua Bengio, he founded and now leads SpeechBrain, one of the most extensively used open-source toolkits in the field of speech processing and conversational AI.

Current Students

Master's Research - Concordia University
Collaborating researcher - Concordia University University
Collaborating researcher - Concordia University University
PhD - Concordia University
Co-supervisor :
Undergraduate - Concordia University
Master's Research - Concordia University
PhD - Concordia University
Co-supervisor :
PhD - Concordia University
Collaborating researcher - Concordia University University
Collaborating researcher - Concordia University University
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Concordia University
PhD - Concordia University
Co-supervisor :
Postdoctorate - McGill University
PhD - Université de Montréal
Collaborating researcher - Concordia University University
Master's Research - Concordia University

Publications

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
Eleonora Mancini
Francesco Paissan
Paolo Torroni
Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis. While models for speech-based PD detectio… (see more)n have shown strong performance, their interpretability remains underexplored. This study systematically evaluates several explainability methods to identify PD-specific speech features, aiming to support the development of accurate, interpretable models for clinical decision-making in PD diagnosis and monitoring. Our methodology involves (i) obtaining attributions and saliency maps using mainstream interpretability techniques, (ii) quantitatively evaluating the faithfulness of these maps and their combinations obtained via union and intersection through a range of established metrics, and (iii) assessing the information conveyed by the saliency maps for PD detection from an auxiliary classifier. Our results reveal that, while explanations are aligned with the classifier, they often fail to provide valuable information for domain experts.
A protocol for trustworthy EEG decoding with neural networks
Davide Borra
Elisa Magosso
SpeechBrain-MOABB: An open-source Python library for benchmarking deep neural networks applied to EEG signals
Davide Borra
Francesco Paissan
Listenable Maps for Zero-Shot Audio Classifiers
Francesco Paissan
Luca Della Libera
Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthines… (see more)s of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-Shot context), which, to the best of our knowledge, is the first decoder-based post-hoc interpretation method for explaining the decisions of zero-shot audio classifiers. The proposed method utilizes a novel loss function that maximizes the faithfulness to the original similarity between a given text-and-audio pair. We provide an extensive evaluation using the Contrastive Language-Audio Pretraining (CLAP) model to showcase that our interpreter remains faithful to the decisions in a zero-shot classification context. Moreover, we qualitatively show that our method produces meaningful explanations that correlate well with different text prompts.
What Are They Doing? Joint Audio-Speech Co-Reasoning
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming
Shubham Gupta
Isaac Neri Gomez-Sarmiento
Faez Amjed Mezdari
Explaining Network Decision Provides Insights on the Causal Interaction Between Brain Regions in a Motor Imagery Task
Davide Borra
Multi-modal Decoding of Reach-to-Grasping from EEG and EMG via Neural Networks
Davide Borra
Matteo Fraternali
Elisa Magosso
LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
Eleonora Mancini
Francesco Paissan
Audio Editing with Non-Rigid Text Prompts
Francesco Paissan
Zhepei Wang
Paris Smaragdis
In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits … (see more)that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.
ProGRes: Prompted Generative Rescoring on ASR n-Best
Ada Defne Tur
Adel Moumen
Listenable Maps for Audio Classifiers
Francesco Paissan