Jovana Mitrovic

Improving fine-grained understanding in image-text pre-training

Ioana Bica

Anastasija Ili'c

Matthias Bauer

Goker Erdogan

Matko Bovsnjak

Christos Kaplanis

Alexey A. Gritsenko

Matthias Minderer

Charles Blundell

Razvan Pascanu

Jovana Mitrovic

We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations… (see more) from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language tokens and compute for each token a language-grouped vision embedding as the weighted average of patches. The token and language-grouped vision embeddings are then contrasted through a fine-grained sequence-wise loss that only depends on individual samples and does not require other batch samples as negatives. This enables more detailed information to be learned in a computationally inexpensive manner. SPARC combines this fine-grained loss with a contrastive loss between global image and text embeddings to learn representations that simultaneously encode global and local information. We thoroughly evaluate our proposed method and show improved performance over competing approaches both on image-level tasks relying on coarse-grained information, e.g. classification, as well as region-level tasks relying on fine-grained information, e.g. retrieval, object detection, and segmentation. Moreover, SPARC improves model faithfulness and captioning in foundational vision-language models.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

arxiv.org

Continually learning representations at scale

Alexandre Galashov

Jovana Mitrovic

Dhruva Tirumala

Yee Whye Teh

Timothy Nguyen

Arslan Chaudhry

Razvan Pascanu

2023-01-01

CoLLAs (published)

proceedings.mlr.press

Hierarchical Adversarially Learned Inference

Ishmael Belghazi

Sai Rajeswar

We propose a novel hierarchical generative model with a simple Markovian structure and a corresponding inference model. Both the generative … (see more)and inference model are trained using the adversarial learning paradigm. We demonstrate that the hierarchical structure supports the learning of progressively more abstract representations as well as providing semantically meaningful reconstructions with different levels of fidelity. Furthermore, we show that minimizing the Jensen-Shanon divergence between the generative and inference network is enough to minimize the reconstruction error. The resulting semantically meaningful hierarchical latent structure discovery is exemplified on the CelebA dataset. There, we show that the features learned by our model in an unsupervised way outperform the best handcrafted features. Furthermore, the extracted features remain competitive when compared to several recent deep supervised approaches on an attribute prediction task on CelebA. Finally, we leverage the model's inference network to achieve state-of-the-art performance on a semi-supervised variant of the MNIST digit classification task.

2018-02-04

ArXiv (preprint)

openreview.net

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Jovana Mitrovic

Publications

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Popular keywords:

Jovana Mitrovic

Publications