Aaron Courville

Reza Bayat

PhD - Université de Montréal

Co-supervisor :

Pascal Vincent

Anirudh Buvanesh

PhD - Université de Montréal

Principal supervisor :

Laurent Charlin

Razvan Ciuca

Master's Research - Université de Montréal

Juan Duque

PhD - Université de Montréal

PhD - Université de Montréal

Arian Hosseini

PhD - Université de Montréal

Uday Kapur

PhD - Université de Montréal

Amr Khalifa

PhD - Université de Montréal

Samuel Lavoie

PhD - Université de Montréal

Zhixuan Lin

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Rishabh Agarwal

Andrei Nicolicioiu

PhD - Université de Montréal

Michael Noukhovitch

PhD - Université de Montréal

Johan Samir Obando Ceron

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Dereck Piché

Master's Research - Université de Montréal

Khaled Rouissi

Master's Research - Université de Montréal

Esra'a Saleh

PhD - Université de Montréal

Principal supervisor :

Glen Berseth

Vedant Shah

PhD - Université de Montréal

PhD - Université de Montréal

Yusong Wu

PhD - Université de Montréal

Principal supervisor :

Anna (Cheng-Zhi) Huang

Sujin yun

PhD - Université de Montréal

Xiaofeng Zhang

PhD - Université de Montréal

Dinghuai Zhang

PhD - Université de Montréal

Co-supervisor :

Publications

On Bonus-Based Exploration Methods in the Arcade Learning Environment

Adrien Ali Taiga

William Fedus

Marlos C. Machado

Bellemare Marc-Emmanuel

Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration prob… (see more)lems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma's Revenge, Bellemare et al.'s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma's Revenge they do not provide meaningful gains over the simpler

2019-12-31

ICLR.cc/2020/Conference (poster)

openreview.net

AN ENSEMBLE APPROACH FOR DETECTING MACHINE FAILURE FROM SOUND Technical

Faruk Ahmed

Phong Cao Nguyen

We develop an ensemble-based approach for our submission to the anomaly detection challenge at DCASE 2020. The main members of our ensemble … (see more)are auto-encoders (with reconstruction error as the signal), classiﬁers (with negative predictive conﬁdence as the signal), mismatch of the time-shifted signal with its Fourier-phase-shifted version, and a Gaussian mixture model on a set of common short-term features extracted from the waveform. The scores are passed through an exponential non-linearity and weighted to provide the ﬁnal score, where the weighting and scaling hyper-parameters are learned on the development set. Our ensemble improves over the baseline on the development set.

2019-12-31

(published)

www.semanticscholar.org

Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

Boris Knyazev

Harm de Vries

Cătălina Cangea

Graham W. Taylor

Eugene Belilovsky

Scene graph generation (SGG) aims to predict graph-structured descriptions of input images, in the form of objects and relationships between… (see more) them. This task is becoming increasingly useful for progress at the interface of vision and language. Here, it is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. In this paper, we identify two key issues that limit such generalization. Firstly, we show that the standard loss used in this task is unintentionally a function of scene graph density. This leads to the neglect of individual edges in large sparse graphs during training, even though these contain diverse few-shot examples that are important for generalization. Secondly, the frequency of relationships can create a strong bias in this task, such that a blind model predicting the most frequent relationship achieves good performance. Consequently, some state-of-the-art models exploit this bias to improve results. We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA. To address these issues, we introduce a density-normalized edge loss, which provides more than a two-fold improvement in certain generalization metrics. Compared to other works in this direction, our enhancements require only a few lines of code and no added computational cost. We also highlight the difficulty of accurately evaluating models using existing metrics, especially on zero/few shots, and introduce a novel weighted metric.

2019-12-31

Proceedings of the British Machine Vision Conference 2020 (published)

Learning Classical Planning Transition Functions by Deep Neural Networks

Michaela Urbanovská

Ian G Goodfellow

2019-12-31

(published)

www.semanticscholar.org

Université de Montréal Balancing Signals for Semi-Supervised Sequence Learning

Ya Xu

Christopher Pal

Training recurrent neural networks (RNNs) on long sequences using backpropagation through time (BPTT) remains a fundamental challenge. It ha… (see more)s been shown that adding a local unsupervised loss term into the optimization objective makes the training of RNNs on long sequences more effective. While the importance of an unsupervised task can in principle be controlled by a coefficient in the objective function, the gradients with respect to the unsupervised loss term still influence all the hidden state dimensions, which might cause important information about the supervised task to be degraded or erased. Compared to existing semi-supervised sequence learning methods, this thesis focuses upon a traditionally overlooked mechanism – an architecture with explicitly designed private and shared hidden units designed to mitigate the detrimental influence of the auxiliary unsupervised loss over the main supervised task. We achieve this by dividing the RNN hidden space into a private space for the supervised task or a shared space for both the supervised and unsupervised tasks. We present extensive experiments with the proposed framework on several long sequence modeling benchmark datasets. Results indicate that the proposed framework can yield performance gains in RNN models where long term dependencies are notoriously challenging to deal with.

2019-12-31

(published)

www.semanticscholar.org

Unsupervised Learning of Dense Visual Representations

Pedro O. Pinheiro

Amjad Almahairi

Ryan Y. Benmalek

Florian Golemo

Contrastive self-supervised learning has emerged as a promising approach to unsupervised visual representation learning. In general, these m… (see more)ethods learn global (image-level) representations that are invariant to different views (i.e., compositions of data augmentation) of the same image. However, many visual understanding tasks require dense (pixel-level) representations. In this paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations. VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions. Specifically, this is achieved through pixel-level contrastive learning: matching features (that is, features that describes the same location of the scene on different views) should be close in an embedding space, while non-matching features should be apart. VADeR provides a natural representation for dense prediction tasks and transfers well to downstream tasks. Our method outperforms ImageNet supervised pretraining (and strong unsupervised baselines) in multiple dense prediction tasks.

2019-12-31

Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (published)

CLOSURE: Assessing Systematic Generalization of CLEVR Models

Dzmitry Bahdanau

Harm de Vries

Timothy J. O'Donnell

Shikhar Murty

Philippe Beaudoin

The CLEVR dataset of natural-looking questions about 3D-rendered scenes has recently received much attention from the research community. A … (see more)number of models have been proposed for this task, many of which achieved very high accuracies of around 97-99%. In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs. To this end, we test models' understanding of referring expressions based on matching object properties (such as e.g. "the object that is the same size as the red ball") in novel contexts. Our experiments on the thereby constructed CLOSURE benchmark show that state-of-the-art models often do not exhibit systematicity after being trained on CLEVR. Surprisingly, we find that an explicitly compositional Neural Module Network model also generalizes badly on CLOSURE, even when it has access to the ground-truth programs at test time. We improve the NMN's systematic generalization by developing a novel Vector-NMN module architecture with vector-valued inputs and outputs. Lastly, we investigate the extent to which few-shot transfer learning can help models that are pretrained on CLEVR to adapt to CLOSURE. Our few-shot learning experiments contrast the adaptation behavior of the models with intermediate discrete programs with that of the end-to-end continuous models.

2019-12-11

ArXiv (preprint)

Selective Brain Damage: Measuring the Disparate Impact of Model Pruning

Sara Hooker

Yann Dauphin

Andrea Frome

Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little d… (see more)egradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by pruning. We find that certain examples, which we term pruning identified exemplars (PIEs), and classes are systematically more impacted by the introduction of sparsity. Removing PIE images from the test-set greatly improves top-1 accuracy for both pruned and non-pruned models. These hard-to-generalize-to images tend to be mislabelled, of lower image quality, depict multiple objects or require fine-grained classification. These findings shed light on previously unknown trade-offs, and suggest that a high degree of caution should be exercised before pruning is used in sensitive domains.

2019-11-12

arXiv.org (preprint)

openreview.net

What Do Compressed Deep Neural Networks Forget

Sara Hooker

Gregory Clark

Yann Dauphin

Andrea Frome

Deep neural network pruning and quantization techniques have demonstrated it is possible to achieve high levels of compression with surprisi… (see more)ngly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques. We find that models with radically different numbers of weights have comparable top-line performance metrics but diverge considerably in behavior on a narrow subset of the dataset. This small subset of data points, which we term Pruning Identified Exemplars (PIEs) are systematically more impacted by the introduction of sparsity. Compression disproportionately impacts model performance on the underrepresented long-tail of the data distribution. PIEs over-index on atypical or noisy images that are far more challenging for both humans and algorithms to classify. Our work provides intuition into the role of capacity in deep neural networks and the trade-offs incurred by compression. An understanding of this disparate impact is critical given the widespread deployment of compressed models in the wild.

2019-11-12

ArXiv (preprint)

Deep Generative Modeling of LiDAR Data

Lucas Caccia

Herke van Hoof

Joelle Pineau

Building models capable of generating structured output is a key challenge for AI and robotics. While generative models have been explored o… (see more)n many types of data, little work has been done on synthesizing lidar scans, which play a key role in robot mapping and localization. In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map. Our approach can generate high quality samples, while simultaneously learning a meaningful latent representation of the data. We demonstrate significant improvements against state-of-the-art point cloud generation methods. Furthermore, we propose a novel data representation that augments the 2D signal with absolute positional information. We show that this helps robustness to noisy and imputed input; the learned model can recover the underlying lidar scan from seemingly uninformative data

2019-11-02

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

Batch Weight for Domain Adaptation With Mass Shift

Mikolaj Binkowski

R Devon Hjelm

Unsupervised domain transfer is the task of transferring or translating samples from a source distribution to a different target distributio… (see more)n. Current solutions unsupervised domain transfer often operate on data on which the modes of the distribution are well-matched, for instance have the same frequencies of classes between source and target distributions. However, these models do not perform well when the modes are not well-matched, as would be the case when samples are drawn independently from two different, but related, domains. This mode imbalance is problematic as generative adversarial networks (GANs), a successful approach in this setting, are sensitive to mode frequency, which results in a mismatch of semantics between source samples and generated samples of the target distribution. We propose a principled method of re-weighting training samples to correct for such mass shift between the transferred distributions, which we call batch weight. We also provide rigorous probabilistic setting for domain transfer and new simplified objective for training transfer networks, an alternative to complex, multi-component loss functions used in the current state-of-the art image-to-image translation models. The new objective stems from the discrimination of joint distributions and enforces cycle-consistency in an abstract, high-level, rather than pixel-wise, sense. Lastly, we experimentally show the effectiveness of the proposed methods in several image-to-image translation tasks.

2019-11-01

2019 IEEE/CVF International Conference on Computer Vision (ICCV) (published)

Improved Conditional VRNNs for Video Prediction

Lluis Castrejon

Nicolas Ballas

Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent v… (see more)ariable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models. Our approach relies on a hierarchy of latent variables, which defines a family of flexible prior and posterior distributions in order to better model the probability of future sequences. We validate our proposal through a series of ablation experiments and compare our approach to current state-of-the-art latent variable models. Our method performs favorably under several metrics in three different datasets.

2019-11-01

2019 IEEE/CVF International Conference on Computer Vision (ICCV) (published)