Mirco Ravanelli

Dehestani Amirali

Stagiaire de recherche - Concordia University

Seina Assadian

Collaborateur·rice de recherche - Concordia University

Cordelle Briac

Collaborateur·rice de recherche - Concordia University

Leo Brodeur Brodeur

Stagiaire de recherche - Concordia

leobrod44@gmail.com

Gallegati Caterina

Stagiaire de recherche - Concordia

Victor Cruz

Maîtrise recherche - Concordia

Luca Della Libera

Doctorat - Concordia

Co-superviseur⋅e :

Wagner Drew

Maîtrise recherche - Concordia

Co-superviseur⋅e :

Irina Rish

Gianfranco Dumoulin Bertucci

Maîtrise recherche - Concordia

Site web

nadine.el-mufti@mila.quebec

Nadine El-Mufti

Maîtrise recherche - Concordia

Site web

Maab Elrashid Ahmed Mohamed

Google Scholar

Doctorat - Concordia

Co-superviseur⋅e :

Doctorat - Concordia

Alessio Giuseppe Alessio

Collaborateur·rice de recherche - International School for Advanced Studies (Trieste, Italy)

Salman Sami Hussain Ali

Collaborateur·rice de recherche - Concordia University

Haoyu Li

Stagiaire de recherche - Concordia Univesity

SpeechBrain 1.0 : rendre l’IA conversationnelle accessible à tout le monde

Eleonora Mancini

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - Concordia

Doctorat - Concordia

Co-superviseur⋅e :

Peter Peter

Postdoctorat - McGill

Doctorat - UdeM

Stagiaire de recherche - Sapienza University of Rome

Billets de blogue

13 juin 2024

par

Mirco Ravanelli

Lire l'article

Introducing SpeechBrain: A general-purpose PyTorch speech processing toolkit

28 avril 2021

Voici SpeechBrain : Une boîte à outils polyvalente de traitement de la parole basée sur PyTorch

par

Mirco Ravanelli

Loren Lugosch

Lire l'article

Publications

ProGRes: Prompted Generative Rescoring on ASR n-Best

Ada Defne Tur

Adel Moumen

2024-12-02

2024 IEEE Spoken Language Technology Workshop (SLT) (publié)

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech

Eleonora Mancini

Francesco Paissan

Paolo Torroni

Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis. While models for speech-based PD detectio… (voir plus)n have shown strong performance, their interpretability remains underexplored. This study systematically evaluates several explainability methods to identify PD-specific speech features, aiming to support the development of accurate, interpretable models for clinical decision-making in PD diagnosis and monitoring. Our methodology involves (i) obtaining attributions and saliency maps using mainstream interpretability techniques, (ii) quantitatively evaluating the faithfulness of these maps and their combinations obtained via union and intersection through a range of established metrics, and (iii) assessing the information conveyed by the saliency maps for PD detection from an auxiliary classifier. Our results reveal that, while explanations are aligned with the classifier, they often fail to provide valuable information for domain experts.

2024-11-12

ArXiv (prépublication)

A protocol for trustworthy EEG decoding with neural networks

Davide Borra

Elisa Magosso

2024-11-01

Neural Networks (publié)

SpeechBrain-MOABB: An open-source Python library for benchmarking deep neural networks applied to EEG signals

Davide Borra

Francesco Paissan

2024-11-01

Computers in Biology and Medicine (publié)

Listenable Maps for Zero-Shot Audio Classifiers

Francesco Paissan

Luca Della Libera

Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthines… (voir plus)s of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-Shot context), which, to the best of our knowledge, is the first decoder-based post-hoc interpretation method for explaining the decisions of zero-shot audio classifiers. The proposed method utilizes a novel loss function that maximizes the faithfulness to the original similarity between a given text-and-audio pair. We provide an extensive evaluation using the Contrastive Language-Audio Pretraining (CLAP) model to showcase that our interpreter remains faithful to the decisions in a zero-shot classification context. Moreover, we qualitatively show that our method produces meaningful explanations that correlate well with different text prompts.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

What Are They Doing? Joint Audio-Speech Co-Reasoning

Yingzhi Wang

Pooneh Mousavi

Artem Ploujnikov

2024-09-22

ArXiv (prépublication)

What Are They Doing? Joint Audio-Speech Co-Reasoning

Yingzhi Wang

Pooneh Mousavi

Artem Ploujnikov

In audio and speech processing, tasks usually focus on either the audio or speech modality, even when both sounds and human speech are prese… (voir plus)nt in the same audio clip. Recent Auditory Large Language Models (ALLMs) have made it possible to process audio and speech simultaneously within a single model, leading to further considerations of joint audio-speech tasks. In this paper, we establish a novel benchmark to investigate how well ALLMs can perform joint audio-speech processing. Specifically, we introduce Joint Audio-Speech Co-Reasoning (JASCO), a novel task that unifies audio and speech processing, strictly requiring co-reasoning across both modalities. We also release a scene-reasoning dataset called"What Are They Doing". Additionally, we provide deeper insights into the models' behaviors by analyzing their dependence on each modality.

2024-09-22

ArXiv (prépublication)

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

Shubham Gupta

Isaac Neri Gomez-Sarmiento

Faez Amjed Mezdari

2024-09-19

Lecture Notes in Computer Science (publié)

Explaining Network Decision Provides Insights on the Causal Interaction Between Brain Regions in a Motor Imagery Task

Davide Borra

2024-09-19

Lecture Notes in Computer Science (publié)

Multi-modal Decoding of Reach-to-Grasping from EEG and EMG via Neural Networks

Davide Borra

Matteo Fraternali

Elisa Magosso

2024-09-19

Lecture Notes in Computer Science (publié)

Audio Editing with Non-Rigid Text Prompts

Francesco Paissan

Zhepei Wang

Paris Smaragdis

In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits … (voir plus)that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.

2024-09-01

Interspeech 2024 (publié)

Progres: Prompted Generative Rescoring on ASR N-Best

Ada Defne Tur

Adel Moumen

Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best h… (voir plus)ypotheses generated during the beam search process. However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence scorer LLM. We evaluated our approach using different speech recognizers and observed significant relative improvement in the word error rate (WER) ranging from 5% to 25%.

2024-08-30

ArXiv (prépublication)