Portrait of (Rex) Devon Hjelm

(Rex) Devon Hjelm

Affiliate Member
Research Scientist, Apple MLR
Research Topics
Causality
Deep Learning
Generative Models
Information Theory
Online Learning
Probabilistic Models
Reasoning
Reinforcement Learning
Representation Learning

Current Students

PhD - Université de Montréal
Co-supervisor :

Publications

On the Modeling Capabilities of Large Language Models for Sequential Decision Making
Martin Klissarov
Alexander T Toshev
Bogdan Mazoure
Grounding Multimodal Large Language Models in Actions
Andrew Szot
Bogdan Mazoure
Harsh Agrawal
Zsolt Kira
Alexander T Toshev
Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains including Embodied AI. In this w… (see more)ork, we study how to best ground a MLLM into different embodiments and their associated action spaces, including both continuous and discrete actions. For continuous actions, a set of learned tokenizations that capture an action at various resolutions allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions, semantically aligning these actions with the native output token space of the MLLM leads to the strongest performance. We arrive at these lessons via a thorough study of seven action grounding approaches on five different environments, encompassing over 114 embodied tasks.
Grounding Multimodal Large Language Models in Actions
Andrew Szot
Bogdan Mazoure
Harsh Agrawal
Zsolt Kira
Alexander T Toshev
Generative Models for Decision Making
Bogdan Mazoure
Lisa Lee
Roberta Raileanu
Yilun Du
Walter Talbott
Katherine Metcalf
Alexander T Toshev
Generative Artificial Intelligence (AI) has made significant advancements in recent years, particularly with the development of large langua… (see more)ge and diffusion models. These generative models have demonstrated impressive capabilities in various tasks, such as text generation and image and audio synthesis. Concurrently, Reinforcement Learning (RL) has made significant strides in solving complex sequential decision-making problems with the help of external knowledge sources . However, there remains untapped potential in combining generative models with RL algorithms to tackle real-world challenges, particularly to improve sample efficiency of tabula rasa training by introducing priors from related domains such as visual question-answering, image captioning and image generation. This workshop aims to bring together researchers and practitioners from the fields of generative AI and reinforcement learning to explore the latest advances, methodologies, and applications. By fostering collaborations between these two domains, we intend to unlock new opportunities for addressing complex problems that lie at the intersection of both fields.
Large Language Models as Generalizable Policies for Embodied Tasks
Andrew Szot
Max Schwarzer
Harsh Agrawal
Bogdan Mazoure
Walter Talbott
Rin Metcalf
Natalie Mackraz
Alexander T Toshev
Poly-View Contrastive Learning
Amitis Shidani
Jason Ramapuram
Russell Webb
Eeshan Gunesh Dhekane
Dan Busbridge
Self-supervised multimodal learning for group inferences from MRI data: Discovering disorder-relevant brain regions and multimodal links
Alex Fedorov
Eloy Geenjaar
Lei Wu
Tristan Sylvain
Thomas P. DeRamus
Margaux Luck
Maria Misiura
Girish Mittapalle
Sergey M. Plis
Vince D. Calhoun
Value function estimation using conditional diffusion models for control
Bogdan Mazoure
Walter Talbott
Miguel Ángel Bautista
Alexander T Toshev
Joshua M. Susskind
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
Neel Joshi
Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
Xin Wang
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Yale Song
Contrastive learning relies on an assumption that positive pairs contain related views that share certain underlying information about an in… (see more)stance, e.g., patches of an image or co-occurring multimodal signals of a video. What if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we pro-pose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to ex-isting contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning bench-marks that exhibit a variety of real-world noise patterns.
Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
Xin Wang
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Ya-heng Song
Contrastive learning relies on an assumption that positive pairs contain related views that share certain underlying information about an in… (see more)stance, e.g., patches of an image or co-occurring multimodal signals of a video. What if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we pro-pose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to ex-isting contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning bench-marks that exhibit a variety of real-world noise patterns.
Understanding by Understanding Not: Modeling Negation in Language Models
Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language mode… (see more)ls often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top 1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.