Publications

Simulation-Free Schrödinger Bridges via Score and Flow Matching

Alexander Tong

Nikolay Malkin

Kilian FATRAS

Lazar Atanackovic

Yanlei Zhang

Guillaume Huguet

Guy Wolf

Yoshua Bengio

We present simulation-free score and flow matching ([SF]…

2023-06-19

ICML.cc/2023/Workshop/Frontiers4LCD (published)

Sarath Chandar Anbil Parthipan

Thompson Sampling for Improved Exploration in GFlowNets

Jarrid Rector-Brooks

Kanika Madan

Moksh J. Jain

Maksym Korablyov

Cheng-Hao Liu

Nikolay Malkin

Yoshua Bengio

Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over composition… (see more)al objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.

2023-06-19

ICML.cc/2023/Workshop/SPIGM (poster)

Visual Chain-of-Thought Diffusion Models

William Harvey

Frank Wood

Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditio… (see more)ned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure. In the first stage we sample an embedding describing the semantic content of the image. In the second stage we sample the image conditioned on this embedding and then discard the embedding. Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25 - 50% compared to standard unconditional generation.

2023-06-19

ICML.cc/2023/Workshop/SPIGM (poster)

Evolving Computation Graphs

Andreea Deac

Jian Tang

Graph neural networks (GNNs) have demonstrated success in modeling relational data, especially for data that exhibits homophily: when a conn… (see more)ection between nodes tends to imply that they belong to the same class. However, while this assumption is true in many relevant situations, there are important real-world scenarios that violate this assumption, and this has spurred research into improving GNNs for these cases. In this work, we propose Evolving Computation Graphs (ECGs), a novel method for enhancing GNNs on heterophilic datasets. Our approach builds on prior theoretical insights linking node degree, high homophily, and inter vs intra-class embedding similarity by rewiring the GNNs' computation graph towards adding edges that connect nodes that are likely to be in the same class. We utilise weaker classifiers to identify these edges, ultimately improving GNN performance on non-homophilic data as a result. We evaluate ECGs on a diverse set of recently-proposed heterophilous datasets and demonstrate improvements over the relevant baselines. ECG presents a simple, intuitive and elegant approach for improving GNN performance on heterophilic datasets without requiring prior domain knowledge.

2023-06-18

ICML.cc/2023/Workshop/TAGML (poster)

AI Clinics on Mobile (AICOM): Universal AI Doctors for the Underserved and Hard-to-Reach

Tianyi Yang

Tianze Yang

Na An

Ao Kong

Shaoshan Liu

Xue (Steve) Liu

2023-06-17

ArXiv (preprint)

Doubly Right Object Recognition: A Why Prompt for Visual Rationales

Chengzhi Mao

Revant Teotia

Amrutha Sundar

Sachit Menon

Junfeng Yang

Xin Wang

Carl Vondrick

Many visual recognition models are evaluated only on their classification accuracy, a metric for which they obtain strong performance. In th… (see more)is paper, we investigate whether computer vision models can also provide correct rationales for their predictions. We propose a “doubly right” object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. We find that state-of-the-art visual models, such as CLIP, often provide incorrect rationales for their categorical predictions. However, by transferring the rationales from language models into visual representations through a tailored dataset, we show that we can learn a “why prompt,” which adapts large visual representations to produce correct rationales. Visualizations and empirical experiments show that our prompts significantly improve performance on doubly right object recognition, in addition to zero-shot transfer to unseen tasks and datasets.

2023-06-17

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

Open-Set Likelihood Maximization for Few-Shot Learning

Malik Boudiaf

Etienne Bennequin

Myriam Tami

Antoine Toubhans

Pablo Piantanida

Celine Hudelot

Ismail Ben Ayed

We tackle the Few-Shot Open-Set Recognition (FSOSR) problem, i.e. classifying instances among a set of classes for which we only have a few … (see more)labeled samples, while simultaneously detecting instances that do not belong to any known class. We explore the popular transductive setting, which leverages the unlabelled query instances at inference. Motivated by the observation that existing transductive methods perform poorly in open-set scenarios, we propose a generalization of the maximum likelihood principle, in which latent scores down-weighing the influence of potential outliers are introduced alongside the usual parametric model. Our formulation embeds supervision constraints from the support set and additional penalties discouraging overconfident predictions on the query set. We proceed with a block-coordinate descent, with the latent scores and parametric model co-optimized alternately, thereby benefiting from each other. We call our resulting formulation Open-Set Likelihood Optimization (OSLO). OSLO is interpretable and fully modular; it can be applied on top of any pre-trained model seamlessly. Through extensive experiments, we show that our method surpasses existing inductive and transductive methods on both aspects of open-set recognition, namely inlier classification and outlier detection. Code is available at https://github.com/ebennequin/few-shot-open-set.

2023-06-17

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

Anas Mahmoud

Jordan S. K. Hu

Tianshu Kuai

Ali Harakeh

Liam Paull

Steven L. Waslander

An effective framework for learning 3D representations for perception tasks is distilling rich self-supervised image features via contrastiv… (see more)e learning. However, image-to-point representation learning for autonomous driving datasets faces two main challenges: 1) the abundance of self-similarity, which results in the contrastive losses pushing away semantically similar point and image regions and thus disturbing the local semantic structure of the learned representations, and 2) severe class imbalance as pretraining gets dominated by over-represented classes. We propose to alleviate the self-similarity problem through a novel semantically tolerant image-to-point contrastive loss that takes into consideration the semantic distance between positive and negative image regions to minimize contrasting semantically similar point and image regions. Additionally, we address class imbalance by designing a class-agnostic balanced loss that approximates the degree of class imbalance through an aggregate sample-to-samples semantic similarity measure. We demonstrate that our semantically-tolerant contrastive loss with class balancing improves state-of-the-art 2D-to-3D representation learning in all evaluation settings on 3D semantic segmentation. Our method consistently outperforms state-of-the-art 2D-to-3D representation learning frameworks across a wide range of 2D self-supervised pretrained models.

2023-06-17

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Mahmoud Assran

Quentin Duval

Ishan Misra

Piotr Bojanowski

Pascal Vincent

Michael Rabbat

Yann LeCun

Nicolas Ballas

This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. W… (see more)e introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.

2023-06-17

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

Simulated Annealing in Early Layers Leads to Better Generalization

Amir M. Sarfi

Zahra Karimpour

Muawiz Chaudhary

Nasir M. Khalid

Mirco Ravanelli

Sudhir Mudur

Eugene Belilovsky

Recently, a number of iterative learning methods have been introduced to improve generalization. These typically rely on training for longer… (see more) periods of time in exchange for improved generalization. LLF (later-layer-forgetting) is a state-of-the-art method in this category. It strengthens learning in early layers by periodically re-initializing the last few layers of the network. Our principal innovation in this work is to use Simulated annealing in EArly Layers (SEAL) of the network in place of re-initialization of later layers. Essentially, later layers go through the normal gradient descent process, while the early layers go through short stints of gradient ascent followed by gradient descent. Extensive experiments on the popular Tiny-ImageNet dataset benchmark and a series of transfer learning and few-shot learning tasks show that we outperform LLF by a significant margin. We further show that, compared to normal training, LLF features, although improving on the target task, degrade the transfer learning performance across all datasets we explored. In comparison, our method outperforms LLF across the same target datasets by a large margin. We also show that the prediction depth of our method is significantly lower than that of LLF and normal training, indicating on average better prediction performance. 11The code to reproduce our results is publicly available at: https://github.com/amiiir-sarfi/SEAL

2023-06-17

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)