(Rex) Devon Hjelm

Sergey Plis

Vince D. Calhoun

2023-12-16

NeuroImage (published)

Value function estimation using conditional diffusion models for control

Bogdan Mazoure

Walter Talbott

Miguel Ángel Bautista

Alexander T Toshev

Joshua M. Susskind

2023-06-09

ArXiv (preprint)

PatchBlender: A Motion Prior for Video Transformers

Gabriele Prato

Yale Song

Janarthanan Rajendran

Neel Joshi

Sarath Chandar

2022-11-11

ArXiv (preprint)

Robust Contrastive Learning against Noisy Views

Ching-Yao Chuang

Xin Wang

Vibhav Vineet

Neel Joshi

Antonio Torralba

Stefanie Jegelka

Yale Song

Contrastive learning relies on an assumption that positive pairs contain related views that share certain underlying information about an in… (see more)stance, e.g., patches of an image or co-occurring multimodal signals of a video. What if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we pro-pose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to ex-isting contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning bench-marks that exhibit a variety of real-world noise patterns.

2022-06-18

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

Robust Contrastive Learning against Noisy Views

Ching-Yao Chuang

Xin Wang

Vibhav Vineet

Neel Joshi

Antonio Torralba

Stefanie Jegelka

Ya-heng Song

Contrastive learning relies on an assumption that positive pairs contain related views that share certain underlying information about an in… (see more)stance, e.g., patches of an image or co-occurring multimodal signals of a video. What if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we pro-pose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to ex-isting contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning bench-marks that exhibit a variety of real-world noise patterns.

2022-01-12

ArXiv (preprint)

CMIM: Cross-Modal Information Maximization For Medical Imaging

Tristan Sylvain

Francis Dutil

Tess Berthier

Lisa Di Jorio

Margaux Luck

Yoshua Bengio

In hospitals, data are siloed to specific information systems that make the same information available under different modalities such as th… (see more)e different medical imaging exams the patient undergoes (CT scans, MRI, PET, Ultrasound, etc.) and their associated radiology reports. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.In this paper, we propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time, using recent advances in mutual information maximization. By maximizing cross-modal information at train time, we are able to outperform several state-of-the-art baselines in two different settings, medical image classification, and segmentation. In particular, our method is shown to have a strong impact on the inference-time performance of weaker modalities.

2021-06-06

IEEE International Conference on Acoustics, Speech, and Signal Processing (published)

Understanding by Understanding Not: Modeling Negation in Language Models

Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language mode… (see more)ls often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top 1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.

2021-06-01

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (published)

Object-Centric Image Generation from Layouts

Tristan Sylvain

Pengchuan Zhang

Yoshua Bengio

Shikhar Sharma

2021-05-18

Proceedings of the AAAI Conference on Artificial Intelligence (published)

DATA-EFFICIENT REINFORCEMENT LEARNING

Philip Bachman

Data efficiency poses a major challenge for deep reinforcement learning. We approach this issue from the perspective of self-supervised repr… (see more)esentation learning, leveraging reward-free exploratory data to pretrain encoder networks. We employ a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning. We demonstrate that our method scales well with network capacity and pretraining data. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning, and approaches human-level performance.

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Rishab Goel

Philip Bachman

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interacti… (see more)on with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent’s parameters and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent’s representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari in a setting limited to 100k steps of environment interaction, which represents a 55% relative improvement over the previous state-of-the-art. Notably, even in this limited data regime, SPR exceeds expert human scores on 7 out of 26 games. We’ve made the code associated with this work available at https://github.com/mila-iqia/spr.

2021-01-01

ICLR (published)

Predicting Unreliable Predictions by Shattering a Neural Network

Xu Ji

Razvan Pascanu

Andrea Vedaldi

Balaji Lakshminarayanan

Yoshua Bengio

Piecewise linear neural networks can be split into subfunctions, each with its own activation pattern, domain, and empirical error. Empirica… (see more)l error for the full network can be written as an expectation over empirical error of subfunctions. Constructing a generalization bound on subfunction empirical error indicates that the more densely a subfunction is surrounded by training samples in representation space, the more reliable its predictions are. Further, it suggests that models with fewer activation regions generalize better, and models that abstract knowledge to a greater degree generalize better, all else equal. We propose not only a theoretical framework to reason about subfunction error bounds but also a pragmatic way of approximately evaluating it, which we apply to predicting which samples the network will not successfully generalize to. We test our method on detection of misclassiﬁcation and out-of-distribution samples, ﬁnding that it performs competitively in both cases. In short, some network activation patterns are associated with higher reliability than others, and these can be identiﬁed using subfunction error bounds.

2021-01-01

arXiv.org (preprint)

Pretraining Representations for Data-Efficient Reinforcement Learning

Philip Bachman

Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder w… (see more)hich is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting.