Publications

Graph Neural Networks Meet Probabilistic Graphical Models: A Survey

Qian Zhang

2025-04-05

IEEE International Conference on Acoustics, Speech, and Signal Processing (published)

doi.org

Hyperedge Representations with Hypergraph Wavelets: Applications to Spatial Transcriptomics

Xingzhi Sun

Charles Xu

João Felipe Rocha

Chen Liu

Benjamin Hollander-Bodie

Laney Goldman

Marcello DiStasio

Michael Perlmutter

Smita Krishnaswamy

In many data-driven applications, higher-order relationships among multiple objects are essential in capturing complex interactions. Hypergr… (see more)aphs, which generalize graphs by allowing edges to connect any number of nodes, provide a flexible and powerful framework for modeling such higher-order relationships. In this work, we introduce hypergraph diffusion wavelets and describe their favorable spectral and spatial properties. We demonstrate their utility for biomedical discovery in spatially resolved transcriptomics by applying the method to represent disease-relevant cellular niches for Alzheimer’s disease.

2025-04-05

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

doi.org

arxiv.org

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech

Paolo Torroni

Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis. While models for speech-based PD detectio… (see more)n have shown strong performance, their interpretability remains underexplored. This study systematically evaluates several explainability methods to identify PD-specific speech features, aiming to support the development of accurate, interpretable models for clinical decision-making in PD diagnosis and monitoring. Our methodology involves (i) obtaining attributions and saliency maps using mainstream interpretability techniques, (ii) quantitatively evaluating the faithfulness of these maps and their combinations obtained via union and intersection through a range of established metrics, and (iii) assessing the information conveyed by the saliency maps for PD detection from an auxiliary classifier. Our results reveal that, while explanations are aligned with the classifier, they often fail to provide valuable information for domain experts.

2025-04-05

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (published)

doi.org

arxiv.org

Latent Representation Learning for Multimodal Brain Activity Translation

Arman Afrasiyabi

Dhananjay Bhaskar

Erica Lindsey Busch

Laurent Caplette

Rahul Singh

Guillaume Lajoie

Nicholas B Turk-Browne

Smita Krishnaswamy

Neuroscience employs diverse neuroimaging techniques, each offering distinct insights into brain activity, from electrophysiological recordi… (see more)ngs such as EEG, which have high temporal resolution, to hemodynamic modalities such as fMRI, which have increased spatial precision. However, integrating these heterogeneous data sources remains a challenge, which limits a comprehensive understanding of brain function. We present the Spatiotemporal Alignment of Multimodal Brain Activity (SAMBA) framework, which bridges the spatial and temporal resolution gaps across modalities by learning a unified latent space free of modality-specific biases. SAMBA introduces a novel attention-based wavelet decomposition for spectral filtering of electrophysiological recordings, graph attention networks to model functional connectivity between functional brain units, and recurrent layers to capture temporal autocorrelations in brain signal. We show that the training of SAMBA, aside from achieving translation, also learns a rich representation of brain information processing. We showcase this classify external stimuli driving brain activity from the representation learned in hidden layers of SAMBA, paving the way for broad downstream applications in neuroscience research and clinical contexts.

2025-04-05

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

doi.org

arxiv.org

LMAC-TD: Producing Time Domain Explanations for Audio Classifiers

Neural networks are typically black-boxes that remain opaque with regards to their decision mechanisms. Several works in the literature have… (see more) proposed post-hoc explanation methods to alleviate this issue. This paper proposes LMAC-TD, a post-hoc explanation method that trains a decoder to produce explanations directly in the time domain. This methodology builds upon the foundation of L-MAC, Listenable Maps for Audio Classifiers, a method that produces faithful and listenable explanations. We incorporate SepFormer, a popular transformer-based time-domain source separation architecture. We show through a user study that LMAC-TD significantly improves the audio quality of the produced explanations while not sacrificing from faithfulness.

2025-04-05

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

doi.org

arxiv.org

Principal Curvatures Estimation with Applications to Single Cell Data

Yanlei Zhang

Lydia Mezrag

Xingzhi Sun

Charles Xu

Kincaid MacDonald

Dhananjay Bhaskar

Smita Krishnaswamy

Guy Wolf

Bastian Rieck

2025-04-05

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

doi.org

arxiv.org

What Are They Doing? Joint Audio-Speech Co-Reasoning

Yingzhi Wang

Pooneh Mousavi

Artem Ploujnikov

Mirco Ravanelli

In audio and speech processing, tasks usually focus on either the audio or speech modality, even when both sounds and human speech are prese… (see more)nt in the same audio clip. Recent Auditory Large Language Models (ALLMs) have made it possible to process audio and speech simultaneously within a single model, leading to further considerations of joint audio-speech tasks. In this paper, we establish a novel benchmark to investigate how well ALLMs can perform joint audio-speech processing. Specifically, we introduce Joint Audio-Speech Co-Reasoning (JASCO), a novel task that unifies audio and speech processing, strictly requiring co-reasoning across both modalities. We also release a scene-reasoning dataset called "What Are They Doing". Additionally, we provide deeper insights into the models' behaviors by analyzing their dependence on each modality.

2025-04-05

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

doi.org

arxiv.org

Accelerated learning of a noninvasive human brain-computer interface via manifold geometry

Erica Lindsey Busch

E. Chandra Fincke

Guillaume Lajoie

Smita Krishnaswamy

Nicholas B Turk-Browne

2025-04-02

bioRxiv (preprint)

doi.org

Spinal Cord Tract Integrity in Degenerative Cervical Myelopathy

Newton Cho

Abdul Al-Shawwa

W. Bradley Jacobs

Nathan Evaniew

Jacques Bouchard

Steve Casha

Stephan duPlessis

Peter Lewkonia

Fred Nicholls

Alex Soroceanu

Ganesh Swamy

Kenneth C. Thomas

Michael M. H. Yang

Julien Cohen-Adad

David W. Cadotte

Degenerative cervical myelopathy (DCM) is the most common cause of spinal dysfunction globally. Despite surgical intervention, motor dysfunc… (see more)tion may persist in many patients. The purpose of this study was to comprehensively examine specific spinal cord tract changes in patients with DCM, to better understand potential substrates for compensatory recovery of function. Cervical spinal cord MRI scans with diffusion tensor imaging were performed in patients with DCM and in healthy volunteers. Spinal Cord Toolbox was used to register the PAM50 template, which includes a probabilistic atlas of the white matter tracts of the spinal cord, to the imaging data. Fractional anisotropy (FA) was extracted for each tract at C3 above the level of maximal compression and compared between patients with DCM and healthy volunteers and between patients with mild vs moderate to severe DCM. We included 25 patients with DCM (13 mild and 12 moderate to severe) and 6 healthy volunteers. FA was significantly reduced in DCM subjects relative to healthy volunteers for the lateral corticospinal tract (mild DCM vs healthy ∆ = −0.13, P = .018; moderate to severe DCM vs healthy ∆ = −0.11, P = .047), fasciculus gracilis (mild DCM vs healthy ∆ = −0.16, P = .010; moderate to severe DCM vs healthy ∆ = −0.13, P = .039), and fasciculus cuneatus (mild DCM vs healthy ∆ = −0.16, P = .007; moderate to severe DCM vs healthy ∆ = −0.15, P = .012). There were no differences in FA for all tracts between mild and moderate-to-severe DCM subjects. Patients with DCM had altered diffusion tensor imaging signal in their lateral corticospinal tract, fasciculus gracilis, and fasciculus cuneatus in comparison with healthy volunteers. These findings indicate that DCM is characterized by injury to these structures, which suggests that other tracts within the cord could potentially act as substrates for compensatory motor recovery.

2025-04-02

Neurosurgery (unknown)

doi.org

DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning

Sara Vera Marjanovi'c

Arkil Patel

Vaibhav Adlakha

Milad Aghajohari

Parishad BehnamGhader

Amirhossein Kazemnejad

Gaurav Kamath

Marius Mosbach

Karolina Stanczak

Siva Reddy

Large Reasoning Models like DeepSeek-R1 mark a fundamental shift in how LLMs approach complex problems. Instead of directly producing an ans… (see more)wer for a given input, DeepSeek-R1 creates detailed multi-step reasoning chains, seemingly"thinking"about a problem before providing an answer. This reasoning process is publicly available to the user, creating endless opportunities for studying the reasoning behaviour of the model and opening up the field of Thoughtology. Starting from a taxonomy of DeepSeek-R1's basic building blocks of reasoning, our analyses on DeepSeek-R1 investigate the impact and controllability of thought length, management of long or confusing contexts, cultural and safety concerns, and the status of DeepSeek-R1 vis-\`a-vis cognitive phenomena, such as human-like language processing and world modelling. Our findings paint a nuanced picture. Notably, we show DeepSeek-R1 has a 'sweet spot' of reasoning, where extra inference time can impair model performance. Furthermore, we find a tendency for DeepSeek-R1 to persistently ruminate on previously explored problem formulations, obstructing further exploration. We also note strong safety vulnerabilities of DeepSeek-R1 compared to its non-reasoning counterpart, which can also compromise safety-aligned LLMs.

2025-04-01

ArXiv (preprint)

arxiv.org

A Truncated Newton Method for Optimal Transport

Mete Kemertas

Amir-massoud Farahmand

Allan D. Jepson

2025-04-01

ArXiv (preprint)

doi.org

arxiv.org

Addressing Missing Modality Challenges in MRI Images: A Comprehensive Review

Reza Azad

Mohammad Dehghanmanshadi

Nika Khosravi

Julien Cohen-Adad

Dorit Merhof

2025-03-31

Computational Visual Media (published)

doi.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications