Publications

SpeechBrain: A General-Purpose Speech Toolkit

Mirco Ravanelli

Titouan Parcollet

Peter William VanHarn Plantinga

Aku Rouhe

Samuele Cornell

Loren Lugosch

Cem Subakan

Nauman Dawalatabad

Abdelwahab HEBA

Jianyuan Zhong

Ju-Chieh Chou

Sung-Lin Yeh

Szu-Wei Fu

Chien-Feng Liao

Elena Rastorgueva

Franccois Grondin

William Aris

Hwidong Na

Yan Gao

Renato De Mori … (voir 1 de plus)

Yoshua Bengio

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech proc… (voir plus)essing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies.

2021-06-08

ArXiv (prépublication)

Understanding Capacity Saturation in Incremental Learning

Shenyang Huang

Vincent Francois-Lavet

Guillaume Rabusseau

2021-06-08

Canadian Conference on AI (publié)

Correcting Momentum in Temporal Difference Learning

Emmanuel Bengio

Joelle Pineau

Doina Precup

A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, r… (voir plus)eapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itself changes due to bootstrapping. We first show that this phenomenon exists, and then propose a first-order correction term to momentum. We show that this correction term improves sample efficiency in policy evaluation by correcting target value drift. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.

2021-06-07

ArXiv (prépublication)

openreview.net

CMIM: Cross-Modal Information Maximization For Medical Imaging

Tristan Sylvain

Francis Dutil

Tess Berthier

Lisa Di Jorio

Margaux Luck

(Rex) Devon Hjelm

Yoshua Bengio

In hospitals, data are siloed to specific information systems that make the same information available under different modalities such as th… (voir plus)e different medical imaging exams the patient undergoes (CT scans, MRI, PET, Ultrasound, etc.) and their associated radiology reports. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.In this paper, we propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time, using recent advances in mutual information maximization. By maximizing cross-modal information at train time, we are able to outperform several state-of-the-art baselines in two different settings, medical image classification, and segmentation. In particular, our method is shown to have a strong impact on the inference-time performance of weaker modalities.

2021-06-06

IEEE International Conference on Acoustics, Speech, and Signal Processing (publié)

Double-Linear Thompson Sampling for Context-Attentive Bandits

Djallel Bouneffouf

Raphael Feraud

Sohini Upadhyay

Yasaman Khazaeni

In this paper, we analyze and extend an online learning frame-work known as Context-Attentive Bandit, motivated by various practical applica… (voir plus)tions, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.

2021-06-06

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

Toward Skills Dialog Orchestration with Online Learning

Djallel Bouneffouf

Raphael Feraud

Sohini Upadhyay

Mayank Agarwal

Yasaman Khazaeni

Building multi-domain AI agents is a challenging task and an open problem in the area of AI. Within the domain of dialog, the ability to orc… (voir plus)hestrate multiple independently trained dialog agents, or skills, to create a unified system is of particular significance. In this work, we study the task of online posterior dialog orchestration, where we define posterior orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. To account for the various costs associated with extracting skill features, we consider online posterior orchestration under a skill execution budget. We formalize this setting as Context Attentive Bandit with Observations (CABO), a variant of context attentive bandits, and evaluate it on proprietary conversational datasets.

2021-06-06

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

Multimodal dynamics modeling for off-road autonomous vehicles

Jean-François Tremblay

Travis Manderson

Aurélio Noca

Gregory Dudek

David Meger

Dynamics modeling in outdoor and unstructured environments is difficult because different elements in the environment interact with the robo… (voir plus)t in ways that can be hard to predict. Leveraging multiple sensors to perceive maximal information about the robot’s environment is thus crucial when building a model to perform predictions about the robot’s dynamics with the goal of doing motion planning. We design a model capable of long-horizon motion predictions, leveraging vision, lidar and proprioception, which is robust to arbitrarily missing modalities at test time. We demonstrate in simulation that our model is able to leverage vision to predict traction changes. We then test our model using a real-world challenging dataset of a robot navigating through a forest, performing predictions in trajectories unseen during training. We try different modality combinations at test time and show that, while our model performs best when all modalities are present, it is still able to perform better than the baseline even when receiving only raw vision input and no proprioception, as well as when only receiving proprioception. Overall, our study demonstrates the importance of leveraging multiple sensors when doing dynamics modeling in outdoor conditions.

2021-06-05

2021 IEEE International Conference on Robotics and Automation (ICRA) (publié)

Encoder-Decoder Neural Architecture Optimization for Keyword Spotting

Tong Mo

Bang Liu

2021-06-04

ArXiv (prépublication)

Hierarchical Video Generation for Complex Data

Lluis Castrejon

Nicolas Ballas

Aaron Courville

2021-06-04

ArXiv (prépublication)

SAND-mask: An Enhanced Gradient Masking Strategy for the Discovery of Invariances in Domain Generalization

Soroosh Shahtalebi

Jean-Christophe Gagnon-Audet

Touraj Laleh

Mojtaba Faramarzi

Kartik Ahuja

A major bottleneck in the real-world applications of machine learning models is their failure in generalizing to unseen domains whose data d… (voir plus)istribution is not i.i.d to the training domains. This failure often stems from learning non-generalizable features in the training domains that are spuriously correlated with the label of data. To address this shortcoming, there has been a growing surge of interest in learning good explanations that are hard to vary, which is studied under the notion of Out-of-Distribution (OOD) Generalization. The search for good explanations that are \textit{invariant} across different domains can be seen as finding local (global) minimas in the loss landscape that hold true across all of the training domains. In this paper, we propose a masking strategy, which determines a continuous weight based on the agreement of gradients that flow in each edge of network, in order to control the amount of update received by the edge in each step of optimization. Particularly, our proposed technique referred to as"Smoothed-AND (SAND)-masking", not only validates the agreement in the direction of gradients but also promotes the agreement among their magnitudes to further ensure the discovery of invariances across training domains. SAND-mask is validated over the Domainbed benchmark for domain generalization and significantly improves the state-of-the-art accuracy on the Colored MNIST dataset while providing competitive results on other domain generalization datasets.

2021-06-04

ArXiv (prépublication)

Continual Learning in Deep Networks: an Analysis of the Last Layer

Timothee LESORT

Thomas George