Portrait of Cem (Yusuf) Subakan

Cem (Yusuf) Subakan

Associate Academic Member
Assistant Professor, Université Laval, Department of Computer Science and Software Engineering
Affiliate Assistant Professor, Concordia University, Gina Cody School of Engineering and Computer Science

Biography

Cem Subakan is an assistant professor in the Computer Science and Software Engineering Department at Université Laval, and an affiliate assistant professor in the Computer Science and Software Engineering Department at Concordia University. He is also an associate academic member of Mila – Quebec Artificial Intelligence Institute. After receiving his PhD in computer science from the University of Illinois at Urbana-Champaign (UIUC), Subakan did a postdoc at Mila. He serves as a reviewer for many conferences including NeurIPS, ICML, ICLR, ICASSP and MLSP, as well as for journals, such as IEEE Signal Processing Letters and IEEE Transactions on Audio, Speech, and Language Processing. His principal research interest is machine learning for speech and audio. More specifically, he works on deep learning for source separation and speech enhancement under realistic conditions, neural network interpretability, continual learning and multi-modal learning.

Subakan was awarded the Best Student Paper Award at the 2017 IEEE Machine Learning for Signal Processing Conference, and also obtained a Sabura Muroga Fellowship from UIUC’s Department of Computer Science. He is a core contributor to the SpeechBrain project, leading the speech separation component.

Current Students

Research Intern - Université de Montréal
Co-supervisor :
Master's Research - Université Laval
Co-supervisor :
PhD - Concordia University
Principal supervisor :
PhD - Université Laval
Co-supervisor :

Publications

Real-M: Towards Speech Separation on Real Mixtures
Samuele Cornell
François Grondin
In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation mod… (see more)els on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures, i.e. we observe that the performance predictions of the SI-SNR estimator correlate well with human opinions. Moreover, when evaluating popular speech separation models, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow the performance trends achieved on synthetic benchmarks.
On the Effectiveness of Two-Step Learning for Latent-Variable Models
Latent-variable generative models offer a principled solution for modeling and sampling from complex probability distributions. Implementing… (see more) a joint training objective with a complex prior, however, can be a tedious task, as one is typically required to derive and code a specific cost function for each new type of prior distribution. In this work, we propose a general framework for learning latent variable generative models in a two-step fashion. In the first step of the framework, we train an autoencoder, and in the second step we fit a prior model on the resulting latent distribution. This two-step approach offers a convenient alternative to joint training, as it allows for a straightforward combination of existing models without the hustle of deriving new cost functions, and the need for coding the joint training objectives. Through a set of experiments, we demonstrate that two-step learning results in performances similar to joint training, and in some cases even results in more accurate modeling.
Continual Learning of New Sound Classes Using Generative Replay
Zhepei Wang
Efthymios Tzinis
Paris Smaragdis
Continual learning consists in incrementally training a model on a sequence of datasets and testing on the union of all datasets. In this pa… (see more)per, we examine continual learning for the problem of sound classification, in which we wish to refine already trained models to learn new sound classes. In practice one does not want to maintain all past training data and retrain from scratch, but naively updating a model with new data(sets) results in a degradation of already learned tasks, which is referred to as "catastrophic forgetting." We develop a generative replay procedure for generating training audio spectrogram data, in place of keeping older training datasets. We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data. We thus conclude that we can extend a trained sound classifier to learn new classes without having to keep previously used datasets.