Michal Drozdzal

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Mohammad Pezeshki

Elvis Dohmatob

Pietro Astolfi

Melissa Hall

Jakob Verbeek

Inspired by the principle of deliberate practice in human learning, we propose Deliberate Practice for Synthetic Data Generation (DP), a nov… (voir plus)el framework that improves sample efficiency through dynamic synthetic data generation. Prior work has shown that scaling synthetic data is inherently challenging, as naively adding new data leads to diminishing returns. To address this, pruning has been identified as a key mechanism for improving scaling, enabling models to focus on the most informative synthetic samples. Rather than generating a large dataset and pruning it afterward, DP efficiently approximates the direct generation of informative samples. We theoretically show how training on challenging, informative examples improves scaling laws and empirically validate that DP achieves better scaling performance with significantly fewer training samples and iterations. On ImageNet-100, DP generates 3.4x fewer samples and requires six times fewer iterations, while on ImageNet-1k, it generates 8x fewer samples with a 30% reduction in iterations, all while achieving superior performance compared to prior work.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (publié)

proceedings.mlr.press

Increasing the Utility of Synthetic Images through Chamfer Guidance

Nicola Dall'Asen

Xiaofeng Zhang

Melissa Hall

Jakob Verbeek

Conditional image generative models hold considerable promise to produce infinite amounts of synthetic training data. Yet, recent progress i… (voir plus)n generation quality has come at the expense of generation diversity, limiting the utility of these models as a source of synthetic training data. Although guidance-based approaches have been introduced to improve the utility of generated data by focusing on quality or diversity, the (implicit or explicit) utility functions oftentimes disregard the potential distribution shift between synthetic and real data. In this work, we introduce Chamfer Guidance: a training-free guidance approach which leverages a handful of real exemplar images to characterize the quality and diversity of synthetic data. We show that by leveraging the proposed Chamfer Guidance, we can boost the diversity of the generations w.r.t. a dataset of real images while maintaining or improving the generation quality on ImageNet-1k and standard geo-diversity benchmarks. Our approach achieves state-of-the-art few-shot performance with as little as 2 exemplar real images, obtaining 96.4\% in terms of precision, and 86.4\% in terms of distributional coverage, which increase to 97.5\% and 92.7\%, respectively, when using 32 real images. We showcase the benefits of the Chamfer Guidance generation by training downstream image classifiers on synthetic data, achieving accuracy boost of up to 15\% for in-distribution over the baselines, and up to 16\% in out-of-distribution. Furthermore, our approach does not require using the unconditional model, and thus obtains a 31\% reduction in FLOPs w.r.t. classifier-free-guidance-based approaches at sampling time.

2025-08-14

ArXiv (prépublication)

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Mohammad Pezeshki

Elvis Dohmatob

Pietro Astolfi

Melissa Hall

Jakob Verbeek

2025-05-01

ICML.cc/2025/Conference (présentation orale)

openreview.net

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Koustuv Sinha

Melissa Hall

2025-05-01

arXiv (publié)

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Koustuv Sinha

Melissa Hall

2025-05-01

ArXiv (prépublication)

Entropy Rectifying Guidance for Diffusion and Flow Models

Tariq Berrada

Jakob Verbeek

Karteek Alahari

2025-04-18

ArXiv (prépublication)

Entropy Rectifying Guidance for Diffusion and Flow Models

Tariq Berrada

Jakob Verbeek

Karteek Alahari

2025-04-18

ArXiv (prépublication)

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Mohammad Pezeshki

Elvis Dohmatob

Pietro Astolfi

Melissa Hall

Jakob Verbeek

Inspired by the principle of deliberate practice in human learning, we propose Deliberate Practice for Synthetic Data Generation (DP), a nov… (voir plus)el framework that improves sample efficiency through dynamic synthetic data generation. Prior work has shown that scaling synthetic data is inherently challenging, as naively adding new data leads to diminishing returns. To address this, pruning has been identified as a key mechanism for improving scaling, enabling models to focus on the most informative synthetic samples. Rather than generating a large dataset and pruning it afterward, DP efficiently approximates the direct generation of informative samples. We theoretically show how training on challenging, informative examples improves scaling laws and empirically validate that DP achieves better scaling performance with significantly fewer training samples and iterations. On ImageNet-100, DP generates 3.4x fewer samples and requires six times fewer iterations, while on ImageNet-1k, it generates 8x fewer samples with a 30 percent reduction in iterations, all while achieving superior performance compared to prior work.

2025-02-21

ArXiv (prépublication)

Object-centric Binding in Contrastive Language-Image Pretraining

Rim Assouel

Pietro Astolfi

2025-02-19

ArXiv (prépublication)

Object-centric Binding in Contrastive Language-Image Pretraining

Rim Assouel

Pietro Astolfi

Recent advances in vision language models (VLM) have been driven by contrastive models such as CLIP, which learn to associate visual informa… (voir plus)tion with their corresponding text descriptions. However, these models have limitations in understanding complex compositional scenes involving multiple objects and their spatial relationships. To address these challenges, we propose a novel approach that diverges from commonly used strategies, which rely on the design of hard-negative augmentations. Instead, our work focuses on integrating inductive biases into pre-trained CLIP-like models to improve their compositional understanding without using any additional hard-negatives. To that end, we introduce a binding module that connects a scene graph, derived from a text description, with a slot-structured image representation, facilitating a structured similarity assessment between the two modalities. We also leverage relationships as text-conditioned visual constraints, thereby capturing the intricate interactions between objects and their contextual relationships more effectively. Our resulting model not only enhances the performance of CLIP-based models in multi-object compositional understanding but also paves the way towards more accurate and sample-efficient image-text matching of complex scenes.

2025-02-19

ArXiv (prépublication)

Boosting Latent Diffusion with Perceptual Objectives

Tariq Berrada

Pietro Astolfi

Jakob Verbeek

Melissa Hall

Marton Havasi

Yohann Benchetrit

Karteek Alahari

2025-01-22

ICLR.cc/2025/Conference (poster)