Konstantinos Drossos

Alumni

Publications

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

Stylianos Ioannis Mimilakis

Dmitriy Serdyuk

Gerald Schuller

Tuomas Virtanen

Yoshua Bengio

Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current st… (see more)ate of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel recurrent neural approach that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.

2018-07-07

2018 International Joint Conference on Neural Networks (IJCNN) (published)

doi.org

arxiv.org

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

Stylianos Ioannis Mimilakis

Konstantinos Drossos

Joao Felipe Santos

Gerald Schuller

Tuomas Virtanen

Yoshua Bengio

Singing voice separation based on deep learning relies on the usage of time-frequency masking. In many cases the masking process is not a le… (see more)arnable function or is not encapsulated into the deep learning optimization. Consequently, most of the existing methods rely on a post processing step using the generalized Wiener filtering. This work proposes a method that learns and optimizes (during training) a source-dependent mask and does not need the aforementioned post processing step. We introduce a recurrent inference algorithm, a sparse transformation step to improve the mask generation process, and a learned denoising filter. Obtained results show an increase of 0.49 dB for the signal to distortion ratio and 0.30 dB for the signal to interference ratio, compared to previous state-of-the-art approaches for monaural singing voice separation.

2018-04-14

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Konstantinos Drossos

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Konstantinos Drossos

Publications