Publications

HighRes-net: Multi-Frame Super-Resolution by Recursive Fusion

Michel Deudon

Alfredo Kalaitzis

Md Rifat Arefin

Israel Goytom

Zhichao Lin

Kris Sankaran

Vincent Michalski

S Ebrahimi Kahou

Julien Cornebise

2019-09-24

(published)

Learning Neural Causal Models from Unknown Interventions

Nan Rosemary Ke

Christopher Pal

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from obs… (see more)ervational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the extension and application of methods designed for observational data to include interventions is not straightforward and remains an open problem. In this paper we provide a general framework based on continuous optimization and neural networks to create models for the combination of observational and interventional data. The proposed method is even applicable in the challenging and realistic case that the identity of the intervened upon variable is unknown. We examine the proposed method in the setting of graph recovery both de novo and from a partially-known edge set. We establish strong benchmark results on several structure learning tasks, including structure recovery of both synthetic graphs as well as standard graphs from the Bayesian Network Repository.

2019-09-24

arXiv (preprint)

doi.org

Selfish Emergent Communication

Michael Noukhovitch

Travis LaCroix

Aaron Courville

2019-09-24

(published)

SPECTRA: Sparse Entity-centric Transitions

Rim Assouel

Learning an agent that interacts with objects is ubiquituous in many RL tasks. In most of them the agent’s actions have sparse effects : o… (see more)nly a small subset of objects in the visual scene will be affected by the action taken. We introduce SPECTRA, a model for learning slot-structured transitions from raw visual observations that embodies this sparsity assumption. Our model is composed of a perception module that decomposes the visual scene into a set of latent objects representations (i.e. slot-structured) and a transition module that predicts the next latent set slot-wise and in a sparse way. We show that learning a perception module jointly with a sparse slot-structured transition model not only biases the model towards more entity-centric perceptual groupings but also enables intrinsic exploration strategy that aims at maximizing the number of objects changed in the agents trajectory.

2019-09-24

(published)

On summarized validation curves and generalization

Mohammad Hashir

Joseph Paul Cohen

2019-09-24

(published)

Learning Sparse Mixture of Experts for Visual Question Answering

Vardaan Pahuja

Jie Fu

Christopher Pal

2019-09-18

ArXiv (preprint)

Revisit Policy Optimization in Matrix Form

Sitao Luan

Xiao-Wen Chang

Doina Precup

2019-09-18

ArXiv (preprint)

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Santiago Pascual

Mirco Ravanaelli

Joan Parets I Serra

Antonio Bonafonte

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech sig… (see more)nals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This paper proposes an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different self-supervised tasks. The needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones. Experiments show that the proposed approach can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues. In addition, a number of design choices make the encoder easily exportable, facilitating its direct usage or adaptation to different problems.

2019-09-14

Interspeech 2019 (published)

doi.org

Neural Architecture Search for Class-incremental Learning

Shenyang Huang

Vincent François-Lavet

Guillaume Rabusseau

In class-incremental learning, a model learns continuously from a sequential data stream in which new classes occur. Existing methods often … (see more)rely on static architectures that are manually crafted. These methods can be prone to capacity saturation because a neural network's ability to generalize to new concepts is limited by its fixed capacity. To understand how to expand a continual learner, we focus on the neural architecture design problem in the context of class-incremental learning: at each time step, the learner must optimize its performance on all classes observed so far by selecting the most competitive neural architecture. To tackle this problem, we propose Continual Neural Architecture Search (CNAS): an autoML approach that takes advantage of the sequential nature of class-incremental learning to efficiently and adaptively identify strong architectures in a continual learning setting. We employ a task network to perform the classification task and a reinforcement learning agent as the meta-controller for architecture search. In addition, we apply network transformations to transfer weights from previous learning step and to reduce the size of the architecture search space, thus saving a large amount of computational resources. We evaluate CNAS on the CIFAR-100 dataset under varied incremental learning scenarios with limited computational power (1 GPU). Experimental results demonstrate that CNAS outperforms architectures that are optimized for the entire dataset. In addition, CNAS is at least an order of magnitude more efficient than naively using existing autoML methods.

2019-09-13

ArXiv (preprint)

Torchmeta: A Meta-Learning library for PyTorch

The constant introduction of standardized benchmarks in the literature has helped accelerating the recent advances in meta-learning research… (see more). They offer a way to get a fair comparison between different algorithms, and the wide range of datasets available allows full control over the complexity of this evaluation. However, for a large majority of code available online, the data pipeline is often specific to one dataset, and testing on another dataset requires significant rework. We introduce Torchmeta, a library built on top of PyTorch that enables seamless and consistent evaluation of meta-learning algorithms on multiple datasets, by providing data-loaders for most of the standard benchmarks in few-shot classification and regression, with a new meta-dataset abstraction. It also features some extensions for PyTorch to simplify the development of models compatible with meta-learning algorithms. The code is available here: this https URL

2019-09-13

ArXiv (preprint)

Learning Speaker Representations with Mutual Information

Mirco Ravanelli

Learning good representations is of crucial importance in deep learning. Mutual Information (MI) or similar measures of statistical dependen… (see more)ce are promising tools for learning these representations in an unsupervised way. Even though the mutual information between two random variables is hard to measure directly in high dimensional spaces, some recent studies have shown that an implicit optimization of MI can be achieved with an encoder-discriminator architecture similar to that of Generative Adversarial Networks (GANs). In this work, we learn representations that capture speaker identities by maximizing the mutual information between the encoded representations of chunks of speech randomly sampled from the same sentence. The proposed encoder relies on the SincNet architecture and transforms raw speech waveform into a compact feature vector. The discriminator is fed by either positive samples (of the joint distribution of encoded chunks) or negative samples (from the product of the marginals) and is trained to separate them. We report experiments showing that this approach effectively learns useful speaker representations, leading to promising results on speaker identification and verification tasks. Our experiments consider both unsupervised and semi-supervised settings and compare the performance achieved with different objective functions.

2019-09-12

Interspeech 2019 (published)

doi.org