Publications

The Promise of RL for Autoregressive Image Editing

Ge Ya Luo

Juan A. Rodriguez

Sai Rajeswar

We explore three strategies to enhance performance on a wide range of image editing tasks: supervised fine-tuning (SFT), reinforcement learn… (voir plus)ing (RL), and Chain-of-Thought (CoT) reasoning. In order to study all these components in one consistent framework, we adopt an autoregressive multimodal model that processes textual and visual tokens in a unified manner. We find RL combined with a large multi-modal LLM verifier to be the most effective of these strategies. As a result, we release EARL: Editing with Autoregression and RL, a strong RL-based image editing model that performs competitively on a diverse range of edits compared to strong baselines, despite using much less training data. Thus, EARL pushes the frontier of autoregressive multimodal models on image editing. We release our code, training data, and trained models at https://github.com/mair-lab/EARL.

2025-08-01

ArXiv (prépublication)

arxiv.org

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi

Rabiul Awal

Ankur Sikarwar

Amirhossein Kazemnejad

Ge Ya Luo

Juan A. Rodriguez

Sai Rajeswar

We explore three strategies to enhance performance on a wide range of image editing tasks: supervised fine-tuning (SFT), reinforcement learn… (voir plus)ing (RL), and Chain-of-Thought (CoT) reasoning. In order to study all these components in one consistent framework, we adopt an autoregressive multimodal model that processes textual and visual tokens in a unified manner. We find RL combined with a large multi-modal LLM verifier to be the most effective of these strategies. As a result, we release EARL: Editing with Autoregression and RL, a strong RL-based image editing model that performs competitively on a diverse range of edits compared to strong baselines, despite using much less training data. Thus, EARL pushes the frontier of autoregressive multimodal models on image editing. We release our code, training data, and trained models at https://github.com/mair-lab/EARL.

2025-08-01

ArXiv (prépublication)

doi.org

arxiv.org

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi

Rabiul Awal

Ankur Sikarwar

Amirhossein Kazemnejad

Ge Ya Luo

Juan A. Rodriguez

Sai Rajeswar

We explore three strategies to enhance performance on a wide range of image editing tasks: supervised fine-tuning (SFT), reinforcement learn… (voir plus)ing (RL), and Chain-of-Thought (CoT) reasoning. In order to study all these components in one consistent framework, we adopt an autoregressive multimodal model that processes textual and visual tokens in a unified manner. We find RL combined with a large multi-modal LLM verifier to be the most effective of these strategies. As a result, we release EARL: Editing with Autoregression and RL, a strong RL-based image editing model that performs competitively on a diverse range of edits compared to strong baselines, despite using much less training data. Thus, EARL pushes the frontier of autoregressive multimodal models on image editing. We release our code, training data, and trained models at https://github.com/mair-lab/EARL.

2025-08-01

arXiv (publié)

doi.org

arxiv.org

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi

Rabiul Awal

Ankur Sikarwar

Amirhossein Kazemnejad

Ge Ya Luo

Juan A. Rodriguez

Sai Rajeswar

We explore three strategies to enhance performance on a wide range of image editing tasks: supervised fine-tuning (SFT), reinforcement learn… (voir plus)ing (RL), and Chain-of-Thought (CoT) reasoning. In order to study all these components in one consistent framework, we adopt an autoregressive multimodal model that processes textual and visual tokens in a unified manner. We find RL combined with a large multi-modal LLM verifier to be the most effective of these strategies. As a result, we release EARL: Editing with Autoregression and RL, a strong RL-based image editing model that performs competitively on a diverse range of edits compared to strong baselines, despite using much less training data. Thus, EARL pushes the frontier of autoregressive multimodal models on image editing. We release our code, training data, and trained models at https://github.com/mair-lab/EARL.

2025-08-01

ArXiv (prépublication)

doi.org

arxiv.org

Towards an Interpretable Machine Learning Model for Predicting Antimicrobial Resistance

Mohamed Mediouni

Vladimir Makarenkov

Abdoulaye Banire Diallo

2025-08-01

Journal of Global Antimicrobial Resistance (publié)

doi.org

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Juan A. Rodriguez

Sai Rajeswar

We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing inv… (voir plus)olving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat these tasks separately, WebMMU unifies them using expert-annotated, real-world web data to assess models' abilities in complex multi-step reasoning, precise element grounding, and functional UI comprehension and coding. Our evaluation shows that while multimodal large language models (MLLMs) perform well on basic information extraction, they struggle with reasoning and grounding, editing code to preserve functionality, and generating design-to-code that maintains hierarchy and supports multilingual content. These findings reveal key limitations in current MLLMs and underscore the need for improved multimodal and cross-lingual reasoning to build future web agents capable of automating diverse web development tasks.

2025-08-01

arXiv (publié)

doi.org

WeDesign: Generative AI-Facilitated Community Consultations for Urban Public Space Design

Rashid A. Mushkani

Hugo Berard

Shin (Alexandre) Koseki

Community consultations are integral to urban planning processes intended to incorporate diverse stakeholder perspectives. However, limited … (voir plus)resources, visual and spoken language barriers, and uneven power dynamics frequently constrain inclusive decision-making. This paper examines how generative text-to-image methods, specifically Stable Diffusion XL integrated into a custom platform (WeDesign), may support equitable consultations. A half-day workshop in Montreal involved five focus groups, each consisting of architects, urban designers, AI specialists, and residents from varied demographic groups. Additional data was gathered through semi-structured interviews with six urban planning professionals. Participants indicated that immediate visual outputs facilitated creativity and dialogue, yet noted issues in visualizing specific needs of marginalized groups, such as participants with reduced mobility, accurately depicting local architectural elements, and accommodating bilingual prompts. Participants recommended the development of an open-source platform incorporating in-painting tools, multilingual support, image voting functionalities, and preference indicators. The results indicate that generative AI can broaden participation and enable iterative interactions but requires structured facilitation approaches. The findings contribute to discussions on generative AI's role and limitations in participatory urban design.

2025-08-01

arXiv (publié)

doi.org

Whither symbols in the era of advanced neural networks?

Thomas L. Griffiths

Brenden M. Lake

R. Thomas McCoy

Ellie Pavlick

Taylor Webb

Some of the strongest evidence that human minds should be thought about in terms of symbolic systems has been the way they combine ideas, pr… (voir plus)oduce novelty, and learn quickly. We argue that modern neural networks -- and the artificial intelligence systems built upon them -- exhibit similar abilities. This undermines the argument that the cognitive processes and representations used by human minds are symbolic, although the fact that these neural networks are typically trained on data generated by symbolic systems illustrates that such systems play an important role in characterizing the abstract problems that human minds have to solve. This argument leads us to offer a new agenda for research on the symbolic basis of human thought.

2025-08-01

arXiv (publié)

doi.org

arxiv.org

Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection

Zihan Wang

Samira Ebrahimi Kahou

Narges Armanfard

Zero-shot anomaly detection (ZSAD) enables identifying and localizing defects in unseen categories by relying solely on generalizable featur… (voir plus)es rather than requiring any labeled examples of anomalies. However, existing ZSAD methods, whether using fixed or learned prompts, struggle under domain shifts because their training data are derived from limited training domains and fail to generalize to new distributions. In this paper, we introduce PILOT, a framework designed to overcome these challenges through two key innovations: (1) a novel dual-branch prompt learning mechanism that dynamically integrates a pool of learnable prompts with structured semantic attributes, enabling the model to adaptively weight the most relevant anomaly cues for each input image; and (2) a label-free test-time adaptation strategy that updates the learnable prompt parameters using high-confidence pseudo-labels from unlabeled test data. Extensive experiments on 13 industrial and medical benchmarks demonstrate that PILOT achieves state-of-the-art performance in both anomaly detection and localization under domain shift.

2025-08-01

arXiv (publié)

doi.org

arxiv.org

Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection

Zihan Wang

Samira Ebrahimi Kahou

Narges Armanfard

Zero-shot anomaly detection (ZSAD) enables identifying and localizing defects in unseen categories by relying solely on generalizable featur… (voir plus)es rather than requiring any labeled examples of anomalies. However, existing ZSAD methods, whether using fixed or learned prompts, struggle under domain shifts because their training data are derived from limited training domains and fail to generalize to new distributions. In this paper, we introduce PILOT, a framework designed to overcome these challenges through two key innovations: (1) a novel dual-branch prompt learning mechanism that dynamically integrates a pool of learnable prompts with structured semantic attributes, enabling the model to adaptively weight the most relevant anomaly cues for each input image; and (2) a label-free test-time adaptation strategy that updates the learnable prompt parameters using high-confidence pseudo-labels from unlabeled test data. Extensive experiments on 13 industrial and medical benchmarks demonstrate that PILOT achieves state-of-the-art performance in both anomaly detection and localization under domain shift.

2025-08-01

ArXiv (prépublication)

arxiv.org

Zero-Shot Anomaly Detection with Dual-Branch Prompt Learning

Zihan Wang

Samira Ebrahimi Kahou

Narges Armanfard

Zero-shot anomaly detection (ZSAD) enables identifying and localizing defects in unseen categories by relying solely on generalizable featur… (voir plus)es rather than requiring any labeled examples of anomalies. However, existing ZSAD methods, whether using fixed or learned prompts, struggle under domain shifts because their training data are derived from limited training domains and fail to generalize to new distributions. In this paper, we introduce PILOT, a framework designed to overcome these challenges through two key innovations: (1) a novel dual-branch prompt learning mechanism that dynamically integrates a pool of learnable prompts with structured semantic attributes, enabling the model to adaptively weight the most relevant anomaly cues for each input image; and (2) a label-free test-time adaptation strategy that updates the learnable prompt parameters using high-confidence pseudo-labels from unlabeled test data. Extensive experiments on 13 industrial and medical benchmarks demonstrate that PILOT achieves state-of-the-art performance in both anomaly detection and localization under domain shift.

2025-08-01

ArXiv (prépublication)

arxiv.org

Advancing science- and evidence-based AI policy.

Rishi Bommasani

Sanjeev Arora

Jennifer Chayes

Yejin Choi

Mariano-Florentino Cuéllar

Li Fei-Fei

Daniel E. Ho

Dan Jurafsky

Sanmi Koyejo

Hima Lakkaraju

Arvind Narayanan

Alondra Nelson

Emma Pierson

Joelle Pineau

Scott R. Singer

Gael Varoquaux

Suresh Venkatasubramanian

Ion Stoica

Percy Liang

Dawn Song

Policy must be informed by, but also facilitate the generation of, scientific evidence.

2025-07-31

ArXiv (prépublication)

doi.org

arxiv.org

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Publications

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Mots-clés populaires:

Publications