Reyhane Askari Hemmat

Xiaochuang Han

Nicolas Ballas

Adriana Romero

State-of-the-art video generative models produce promising visual content yet often violate basic physics principles, limiting their utility… (see more). While some attribute this deficiency to insufficient physics understanding from pre-training, we find that the shortfall in physics plausibility also stems from suboptimal inference strategies. We therefore introduce WMReward and treat improving physics plausibility of video generation as an inference-time alignment problem. In particular, we leverage the strong physics prior of a latent world model (here, VJEPA-2) as a reward to search and steer multiple candidate denoising trajectories, enabling scaling test-time compute for better generation performance. Empirically, our approach substantially improves physics plausibility across image-conditioned, multiframe-conditioned, and text-conditioned generation settings, with validation from human preference study. Notably, in the ICCV 2025 Perception Test PhysicsIQ Challenge, we achieve a final score of 62.64%, winning first place and outperforming the previous state of the art by 7.42%. Our work demonstrates the viability of using latent world models to improve physics plausibility of video generation, beyond this specific instantiation or parameterization.

2026-01-14

ArXiv (preprint)

Why Less is More (Sometimes): A Theory of Data Curation

Elvis Dopgima Dohmatob

Mohammad Pezeshki

2025-11-04

ArXiv (preprint)

Improving the Physics of Video Generation with VJEPA-2 Reward Signal

Jianhao Yuan

Xiaofeng Zhang

Felix Friedrich

Nicolas Beltran-Velez

Melissa Hall

Xiaochuang Han

Nicolas Ballas

Adriana Romero

2025-10-21

ArXiv (preprint)

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Koustuv Sinha

Melissa Hall

Adriana Romero

2025-04-30

ArXiv (preprint)

EvalGIM: A Library for Evaluating Generative Image Models

Melissa Hall

Oscar Mañas

Mark Ibrahim

Candace Ross

Pietro Astolfi

Tariq Berrada

Marton Havasi

Yohann Benchetrit

Karen Ullrich

Carolina Braga

Abhishek Charnalia

Maeve Ryan

Michael G. Rabbat

Jakob Verbeek

Adriana Romero

As the use of text-to-image generative models increases, so does the adoption of automatic benchmarking methods used in their evaluation. Ho… (see more)wever, while metrics and datasets abound, there are few unified benchmarking libraries that provide a framework for performing evaluations across many datasets and metrics. Furthermore, the rapid introduction of increasingly robust benchmarking methods requires that evaluation libraries remain flexible to new datasets and metrics. Finally, there remains a gap in synthesizing evaluations in order to deliver actionable takeaways about model performance. To enable unified, flexible, and actionable evaluations, we introduce EvalGIM (pronounced ''EvalGym''), a library for evaluating generative image models. EvalGIM contains broad support for datasets and metrics used to measure quality, diversity, and consistency of text-to-image generative models. In addition, EvalGIM is designed with flexibility for user customization as a top priority and contains a structure that allows plug-and-play additions of new datasets and metrics. To enable actionable evaluation insights, we introduce ''Evaluation Exercises'' that highlight takeaways for specific evaluation questions. The Evaluation Exercises contain easy-to-use and reproducible implementations of two state-of-the-art evaluation methods of text-to-image generative models: consistency-diversity-realism Pareto Fronts and disaggregated measurements of performance disparities across groups. EvalGIM also contains Evaluation Exercises that introduce two new analysis methods for text-to-image generative models: robustness analyses of model rankings and balanced evaluations across different prompt styles. We encourage text-to-image model exploration with EvalGIM and invite contributions at https://github.com/facebookresearch/EvalGIM/.

2024-12-12

ArXiv (preprint)

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

Melissa Hall

Alicia Sun

Candace Ross

Adriana Romero

2024-11-20

Lecture Notes in Computer Science (published)

Deliberate Practice with Synthetic Data

Mohammad Pezeshki

Pietro Astolfi

Melissa Hall

Florian Bordes

Jakob Verbeek

Adriana Romero

2024-10-09

NeurIPS.cc/2024/Workshop/AFM (poster)

openreview.net

Feedback-guided Data Synthesis for Imbalanced Classification

Mohammad Pezeshki

Florian Bordes

Adriana Romero

Current status quo in machine learning is to use static datasets of real images for training, which often come from long-tailed distribution… (see more)s. With the recent advances in generative models, researchers have started augmenting these static datasets with synthetic data, reporting moderate performance improvements on classification tasks. We hypothesize that these performance gains are limited by the lack of feedback from the classifier to the generative model, which would promote the usefulness of the generated samples to improve the classifier's performance. In this work, we introduce a framework for augmenting static datasets with useful synthetic samples, which leverages one-shot feedback from the classifier to drive the sampling of the generative model. In order for the framework to be effective, we find that the samples must be close to the support of the real data of the task at hand, and be sufficiently diverse. We validate three feedback criteria on a long-tailed dataset (ImageNet-LT, Places-LT) as well as a group-imbalanced dataset (NICO++). On ImageNet-LT, we achieve state-of-the-art results, with over

2024-09-10

TMLR (accepted)

openreview.net

An Introduction to Vision-Language Modeling

Florian Bordes

Richard Yuanzhe Pang

Anurag Ajay

Alexander C. Li

Adrien Bardes

Suzanne Petryk

Oscar Mañas

Zhiqiu Lin

Anas Mahmoud

Bargav Jayaraman

Mark Ibrahim

Melissa Hall

Yunyang Xiong

Jonathan Lebensold

Candace Ross

Srihari Jayakumar

Chuan Guo

Diane Bouchacourt

Haider Al-Tahan

Karthik Padthe … (see 22 more)

Vasu Sharma

Huijuan Xu 0001

Hu Xu

Xiaoqing Ellen Tan

Megan Richards

Samuel Lavoie

Pietro Astolfi

Jun Chen

Kushal Tirumala

Rim Assouel

Mazda Moayeri

Arjang Talattof

Kamalika Chaudhuri

Zechun Liu

Xilun Chen

Quentin Garrido

Karen Ullrich

Aishwarya Agrawal

Kate Saenko

Asli Celikyilmaz

Vikas Chandra

2024-05-26

arXiv (preprint)

LEAD: Min-Max Optimization from a Physical Perspective

Amartya Mitra

Guillaume Lajoie

Ioannis Mitliagkas

Adversarial formulations have rekindled interest in two-player min-max games. A central obstacle in the optimization of such games is the ro… (see more)tational dynamics that hinder their convergence. In this paper, we show that game optimization shares dynamic properties with particle systems subject to multiple forces, and one can leverage tools from physics to improve optimization dynamics. Inspired by the physical framework, we propose LEAD, an optimizer for min-max games. Next, using Lyapunov stability theory from dynamical systems as well as spectral analysis, we study LEAD’s convergence properties in continuous and discrete time settings for a class of quadratic min-max games to demonstrate linear convergence to the Nash equilibrium. Finally, we empirically evaluate our method on synthetic setups and CIFAR-10 image generation to demonstrate improvements in GAN training.

2022-12-31

Trans. Mach. Learn. Res. (published)

openreview.net

Negative Momentum for Improved Game Dynamics

Gauthier Gidel

Rémi Le Priol

Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiab… (see more)le games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics are more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.

2019-04-10

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (published)