Publications

RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring

Maxime Lamothe

2025-10-31

arXiv (publié)

Simulate intelligently: Causal incremental reinforcement learning for streamlined industrial chemical process design optimization

Eslam G. Al-Sakkari

Ahmed Ragab

Mohamed Ali

Hanane Dagdougui

Daria C. Boffito

2025-10-31

Journal of Environmental Chemical Engineering (publié)

doi.org

The role of Large Language Models in IoT security: A systematic review of advances, challenges, and opportunities

Saeid Jamshidi

Negar Shahabi

Amin Nikanjam

Kawser Wazed Nafi

Foutse Khomh

Carol Fung

2025-10-31

Internet of Things (publié)

doi.org

Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

William St-Arnaud

Margarida Carvalho

Golnoosh Farnadi

2025-10-31

arXiv (publié)

doi.org

arxiv.org

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Christopher Pal

Juan A. Rodriguez

Sai Rajeswar

We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing inv… (voir plus)olving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat these tasks separately, WebMMU unifies them using expert-annotated, real-world web data to assess models'abilities in complex multi-step reasoning, precise element grounding, and functional UI comprehension and coding. Our evaluation shows that while multimodal large language models (MLLMs) perform well on basic information extraction, they struggle with reasoning and grounding, editing code to preserve functionality, and generating design-to-code that maintains hierarchy and supports multilingual content. These findings reveal key limitations in current MLLMs and underscore the need for improved multimodal and cross-lingual reasoning to build future web agents capable of automating diverse web development tasks.

2025-10-31

Conference on Empirical Methods in Natural Language Processing (publié)

doi.org

openreview.net

Why Less is More (Sometimes): A Theory of Data Curation

Elvis Dopgima Dohmatob

Mohammad Pezeshki

Reyhane Askari Hemmat

2025-10-31

arXiv (publié)

doi.org

arxiv.org

Discovery of Sustainable Refrigerants through Physics-Informed RL Fine-Tuning of Sequence Models

Most refrigerants currently used in air-conditioning systems, such as hydrofluorocarbons, are potent greenhouse gases and are being phased d… (voir plus)own. Large-scale molecular screening has been applied to the search for alternatives, but in practice only about 300 refrigerants are known, and only a few additional candidates have been suggested without experimental validation. This scarcity of reliable data limits the effectiveness of purely data-driven methods. We present Refgen, a generative pipeline that integrates machine learning with physics-grounded inductive biases. Alongside fine-tuning for valid molecular generation, Refgen incorporates predictive models for critical properties, equations of state, thermochemical polynomials, and full vapor compression cycle simulations. These models enable reinforcement learning fine-tuning under thermodynamic constraints, enforcing consistency and guiding discovery toward molecules that balance efficiency, safety, and environmental impact. By embedding physics into the learning process, Refgen leverages scarce data effectively and enables de novo refrigerant discovery beyond the known set of compounds.

2025-10-30

EurIPS.cc/2025/Workshop/SIMBIOCHEM (publié)

doi.org

openreview.net

Curly Flow Matching for Learning Non-gradient Field Dynamics

Katarina Petrovi'c

Lazar Atanackovic

Viggo Moro

Kacper Kapu'sniak

.Ismail .Ilkan Ceylan

Michael M. Bronstein

Avishek Bose

Alexander Tong

Modeling the transport dynamics of natural processes from population-level observations is a ubiquitous problem in the natural sciences. Suc… (voir plus)h models rely on key assumptions about the underlying process in order to enable faithful learning of governing dynamics that mimic the actual system behavior. The de facto assumption in current approaches relies on the principle of least action that results in gradient field dynamics and leads to trajectories minimizing an energy functional between two probability measures. However, many real-world systems, such as cell cycles in single-cell RNA, are known to exhibit non-gradient, periodic behavior, which fundamentally cannot be captured by current state-of-the-art methods such as flow and bridge matching. In this paper, we introduce Curly Flow Matching (Curly-FM), a novel approach that is capable of learning non-gradient field dynamics by designing and solving a Schr\"odinger bridge problem with a non-zero drift reference process -- in stark contrast to typical zero-drift reference processes -- which is constructed using inferred velocities in addition to population snapshot data. We showcase Curly-FM by solving the trajectory inference problems for single cells, computational fluid dynamics, and ocean currents with approximate velocities. We demonstrate that Curly-FM can learn trajectories that better match both the reference process and population marginals. Curly-FM expands flow matching models beyond the modeling of populations and towards the modeling of known periodic behavior in physical systems. Our code repository is accessible at: https://github.com/kpetrovicc/curly-flow-matching.git

2025-10-29

ArXiv (prépublication)

doi.org

arxiv.org

Gistify! Codebase-Level Understanding via Runtime Execution

Hyunji Lee

Minseon Kim

Chinmay Singh

Matheus Pereira

Atharv Sonwane

Isadora White

Elias Stengel-Eskin

Mohit Bansal

Zhengyan Shi

Alessandro Sordoni

Marc-Alexandre Côté

Xingdi Yuan

Lucas Caccia

As coding agents are increasingly deployed in large codebases, the need to automatically design challenging, codebase-level evaluation is ce… (voir plus)ntral. We propose Gistify, a task where a coding LLM must create a single, minimal, self-contained file that can reproduce a specific functionality of a codebase. The coding LLM is given full access to a codebase along with a specific entrypoint (e.g., a python command), and the generated file must replicate the output of the same command ran under the full codebase, while containing only the essential components necessary to execute the provided command. Success on Gistify requires both structural understanding of the codebase, accurate modeling of its execution flow as well as the ability to produce potentially large code patches. Our findings show that current state-of-the-art models struggle to reliably solve Gistify tasks, especially ones with long executions traces.

2025-10-29

ArXiv (prépublication)

doi.org

arxiv.org

Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition

Amine RAZIG

Youssef Soulaymani

Loubna Benabbou

Pierre Cauchy

2025-10-28

ArXiv (prépublication)

doi.org

arxiv.org

Scaling Latent Reasoning via Looped Language Models

Ruiming Zhu

Zixuan Wang

Kai Hua

Tianyu Zhang

Ziniu Li

Haoran Que

Boyi Wei

Zixin Wen

Fan Yin

He Xing

Li Li

Jiajun Shi

Kaijing Ma

Shanda Li

Taylor Kergan

Andrew C. Smith

Xin Qu

Mude Hui

Bohong Wu

Qiyang Min … (voir 13 de plus)

Hongzhi Huang

Xun Zhou

Wei Ye

Jiaheng Liu

Jian Yang 0030

Yunfeng Shi

Chenghua Lin

Enduo Zhao

Tianle Cai

Ge Zhang

Wenhao Huang

Yoshua Bengio

Jason K. Eshraghian

Modern LLMs are trained to"think"primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-trai… (voir plus)ning and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model is available here: http://ouro-llm.github.io.

2025-10-28

ArXiv (prépublication)

doi.org

arxiv.org

Assessing Programming Task Difficulty for Efficient Evaluation of Large Language Models

Florian Tambon

Amin Nikanjam

Cyrine Zid

Foutse Khomh

Giuliano Antoniol

2025-10-27

ACM Transactions on Software Engineering and Methodology (publié)

doi.org

arxiv.org

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Publications

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Mots-clés populaires:

Publications