Publications

Rescuespeech: A German Corpus for Speech Recognition in Search and Rescue Domain

Sangeet Sagar

Bernd Kiefer

Ivana Kruijff-KorbayovÃ¡

Josef van Genabith

Despite the recent advancements in speech recognition, there are still difficulties in accurately transcribing conversational and emotional … (see more)speech in noisy and reverberant acoustic environments. This poses a particular challenge in the search and rescue (SAR) domain, where transcribing conversations among rescue team members is crucial to support real-time decision-making. The scarcity of speech data and associated background noise in SAR scenarios make it difficult to deploy robust speech recognition systems.To address this issue, we have created and made publicly available a German speech dataset called RescueSpeech. This dataset includes real speech recordings from simulated rescue exercises. Additionally, we have released competitive training recipes and pre-trained models. Our study highlights that the performance attained by state-of-the-art methods in this challenging scenario is still far from reaching an acceptable level.

2023-12-16

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (published)

doi.org

arxiv.org

Self-supervised multimodal learning for group inferences from MRI data: Discovering disorder-relevant brain regions and multimodal links

Alex Fedorov

Eloy Geenjaar

Lei Wu

Tristan Sylvain

Thomas P. DeRamus

Margaux Luck

Maria Misiura

Girish Mittapalle

(Rex) Devon Hjelm

Sergey M. Plis

Vince D. Calhoun

2023-12-16

NeuroImage (published)

doi.org

Speech Emotion Diarization: Which Emotion Appears When?

Yingzhi Wang

Mirco Ravanelli

Alaa Nfissi

Alya Yacoubi

Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However, emotions conveyed through speech should be consider… (see more)ed as discrete speech events with definite temporal boundaries, rather than attributes of the entire utterance. To reflect the fine-grained nature of speech emotions and to unify various fine-grained methods under a single objective, we propose a new task: Speech Emotion Diarization (SED). Just as Speaker Diarization answers the question of “Who speaks when?”, Speech Emotion Diarization answers the question of “Which emotion appears when?”. To facilitate the evaluation of the performance and establish a common benchmark, we introduce the Zaion Emotion Dataset (ZED), an openly accessible speech emotion dataset that includes non-acted emotions recorded in real-life conditions, along with manually annotated boundaries of emotion segments within the utterance. We provide competitive baselines and open-source the code and the pre-trained models.

2023-12-16

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (published)

doi.org

arxiv.org

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch

Jeff Hwang

Moto Hira

Caroline Chen

Xiaohui Zhang

Zhaoheng Ni

Guangzhi Sun

Pingchuan Ma

Ruizhe Huang

Vineel Pratap

Yuekai Zhang

Anurag Kumar

Chin-Yun Yu

Chuang Zhu

Chunxi Liu

Jacob Kahn

Mirco Ravanelli

Peng Sun

Shinji Watanabe

Yangyang Shi

Yumeng Tao … (see 4 more)

Robin Scheibler

Samuele Cornell

Sean Kim

Stavros Petridis

TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of au… (see more)dio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio’s development principles and contents and highlight key features we include in its latest version (2.1): self-supervised learning pre-trained pipelines and training recipes, high-performance CTC decoders, speech recognition models and training recipes, advanced media I/O capabilities, and tools for performing forced alignment, multi-channel speech enhancement, and reference-less speech assessment. For a selection of these features, through empirical studies, we demonstrate their efficacy and show that they achieve competitive or state-of-the-art performance.

2023-12-16

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (published)

doi.org

arxiv.org

FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models

Nikolaos Ioannis Bountos

Arthur Ouaknine

David Rolnick

2023-12-15

ArXiv (preprint)

doi.org

arxiv.org

Genetic landscape of an in vivo protein interactome

Savandara Besse

Tatsuya Sakaguchi

Louis Gauthier

Zahra Sahaf

Olivier Péloquin

Lidice Gonzalez

Xavier Castellanos-Girouard

Nazli Koçatug

Chloé Matta

Julie Hussin

Stephen W. Michnick

Adrian W.R. Serohijos

2023-12-15

bioRxiv (preprint)

doi.org

scCross: A Deep Generative Model for Unifying Single-cell Multi-omics with Seamless Integration, Cross-modal Generation, and In-silico Exploration

Xiuhui Yang

Koren K. Mann

Hao Wu

Jun Ding

Single-cell multi-omics illuminate intricate cellular states, yielding transformative insights into cellular dynamics and disease. Yet, whil… (see more)e the potential of this technology is vast, the integration of its multifaceted data presents challenges. Some modalities have not reached the robustness or clarity of established scRNA-seq. Coupled with data scarcity for newer modalities and integration intricacies, these challenges limit our ability to maximize single-cell omics benefits. We introduce scCross: a tool adeptly engineered using variational autoencoder, generative adversarial network principles, and the Mutual Nearest Neighbors (MNN) technique for modality alignment. This synergy ensures seamless integration of varied single-cell multi-omics data. Beyond its foundational prowess in multi-omics data integration, scCross excels in single-cell cross-modal data generation, multi-omics data simulation, and profound in-silico cellular perturbations. Armed with these capabilities, scCross is set to transform the field of single-cell research, establishing itself in the nuanced integration, generation, and simulation of complex multi-omics data.

2023-12-15

bioRxiv (preprint)

doi.org

Temporal encoding in deep reinforcement learning agents

Dongyan Lin

Ann Zixiang Huang

Blake Richards

2023-12-15

Scientific Reports (published)

doi.org

Cone-Traced Supersampling with Subpixel Edge Reconstruction.

Andrei Chubarau

Yangyang Zhao

Ruby Rao

Derek Nowrouzezahrai

Paul Kry

While signed distance fields (SDFs) in theory offer infinite level of detail, they are typically rendered using the sphere tracing algorithm… (see more) at finite resolutions, which causes the common rasterized image synthesis problem of aliasing. Most existing optimized antialiasing solutions rely on polygon mesh representations; SDF-based geometry can only be directly antialiased with the computationally expensive supersampling or with post-processing filters that may produce undesirable blurriness and ghosting. In this work, we present cone-traced supersampling (CTSS), an efficient and robust spatial antialiasing solution that naturally complements the sphere tracing algorithm, does not require casting additional rays per pixel or offline prefiltering, and can be easily implemented in existing real-time SDF renderers. CTSS performs supersampling along the traced ray near surfaces with partial visibility – object contours – identified by evaluating cone intersections within a pixel's view frustum. We further introduce subpixel edge reconstruction (SER), a technique that extends CTSS to locate and resolve complex pixels with geometric edges in relatively flat regions, which are otherwise undetected by cone intersections. Our combined solution relies on a specialized sampling strategy to minimize the number of shading computations and correlates sample visibility to aggregate the samples. With comparable antialiasing quality at significantly lower computational cost, CTSS is a reliable practical alternative to conventional supersampling.

2023-12-14

IEEE Transactions on Visualization and Computer Graphics (published)

doi.org

Feasibility of cognitive neuroscience data collection during a speleological expedition

Anita Paas

Hugo R. Jourde

Arnaud Brignol

Marie-Anick Savard

Zseyvfin Eyqvelle

Samuel Bassetto

Giovanni Beltrame

Emily B.J. Coffey