Portrait de Mirco Ravanelli

Mirco Ravanelli

Membre académique associé
Professeur adjoint, Concordia University, École de génie et d'informatique Gina-Cody
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage profond

Biographie

Mirco Ravanelli est professeur adjoint à l'Université Concordia, professeur associé à l'Université de Montréal et membre associé de Mila – Institut québécois d’intelligence artificielle. Lauréat du prix Amazon Research 2022, il est expert en apprentissage profond et en IA conversationnelle, et a publié plus de 60 articles dans ces domaines. Il se concentre principalement sur les nouveaux algorithmes d'apprentissage profond, y compris l'apprentissage autosupervisé, continu, multimodal, coopératif et économe en énergie. Mirco Ravanelli a effectué son postdoctorat à Mila, sous la direction du professeur Yoshua Bengio. Il est notamment le fondateur et le chef de file de SpeechBrain, l'une des boîtes à outils en code source ouvert les plus largement adoptées dans le domaine du traitement de la parole et de l'IA conversationnelle.

Étudiants actuels

Maîtrise recherche - Concordia
Collaborateur·rice de recherche - Concordia University
Collaborateur·rice de recherche - Concordia University
Doctorat - Concordia
Co-superviseur⋅e :
Baccalauréat - Concordia
Maîtrise recherche - Concordia
Baccalauréat - Concordia
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Collaborateur·rice de recherche - Concordia University
Collaborateur·rice de recherche - Concordia University
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Postdoctorat - McGill
Doctorat - UdeM
Collaborateur·rice de recherche - Concordia University
Maîtrise recherche - Concordia

Publications

Generalization Limits of Graph Neural Networks in Identity Effects Learning
Giuseppe Alessio D'inverno
Simone Brugiapaglia
Graph Neural Networks (GNNs) have emerged as a powerful tool for data-driven learning on various graph domains. They are usually based on a … (voir plus)message-passing mechanism and have gained increasing popularity for their intuitive formulation, which is closely linked to the Weisfeiler-Lehman (WL) test for graph isomorphism to which they have been proven equivalent in terms of expressive power. In this work, we establish new generalization properties and fundamental limits of GNNs in the context of learning so-called identity effects, i.e., the task of determining whether an object is composed of two identical components or not. Our study is motivated by the need to understand the capabilities of GNNs when performing simple cognitive tasks, with potential applications in computational linguistics and chemistry. We analyze two case studies: (i) two-letters words, for which we show that GNNs trained via stochastic gradient descent are unable to generalize to unseen letters when utilizing orthogonal encodings like one-hot representations; (ii) dicyclic graphs, i.e., graphs composed of two cycles, for which we present positive existence results leveraging the connection between GNNs and the WL test. Our theoretical analysis is supported by an extensive numerical study.
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
Eleonora Mancini
Francesco Paissan
Paolo Torroni
Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis. While models for speech-based PD detectio… (voir plus)n have shown strong performance, their interpretability remains underexplored. This study systematically evaluates several explainability methods to identify PD-specific speech features, aiming to support the development of accurate, interpretable models for clinical decision-making in PD diagnosis and monitoring. Our methodology involves (i) obtaining attributions and saliency maps using mainstream interpretability techniques, (ii) quantitatively evaluating the faithfulness of these maps and their combinations obtained via union and intersection through a range of established metrics, and (iii) assessing the information conveyed by the saliency maps for PD detection from an auxiliary classifier. Our results reveal that, while explanations are aligned with the classifier, they often fail to provide valuable information for domain experts.
A protocol for trustworthy EEG decoding with neural networks
Davide Borra
Elisa Magosso
SpeechBrain-MOABB: An open-source Python library for benchmarking deep neural networks applied to EEG signals
Davide Borra
Francesco Paissan
Listenable Maps for Zero-Shot Audio Classifiers
Francesco Paissan
Luca Della Libera
Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthines… (voir plus)s of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-Shot context), which, to the best of our knowledge, is the first decoder-based post-hoc interpretation method for explaining the decisions of zero-shot audio classifiers. The proposed method utilizes a novel loss function that maximizes the faithfulness to the original similarity between a given text-and-audio pair. We provide an extensive evaluation using the Contrastive Language-Audio Pretraining (CLAP) model to showcase that our interpreter remains faithful to the decisions in a zero-shot classification context. Moreover, we qualitatively show that our method produces meaningful explanations that correlate well with different text prompts.
What Are They Doing? Joint Audio-Speech Co-Reasoning
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming
Shubham Gupta
Isaac Neri Gomez-Sarmiento
Faez Amjed Mezdari
Explaining Network Decision Provides Insights on the Causal Interaction Between Brain Regions in a Motor Imagery Task
Davide Borra
Multi-modal Decoding of Reach-to-Grasping from EEG and EMG via Neural Networks
Davide Borra
Matteo Fraternali
Elisa Magosso
LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
Eleonora Mancini
Francesco Paissan
Audio Editing with Non-Rigid Text Prompts
Francesco Paissan
Zhepei Wang
Paris Smaragdis
In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits … (voir plus)that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.
ProGRes: Prompted Generative Rescoring on ASR n-Best
Ada Defne Tur
Adel Moumen