Publications

Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale

Priyan Vaithilingam

Munyeong Kim

Frida-Cecilia Acosta-Parenteau

Daniel Lee

Amine Mhedhbi

Elena L. Glassman

Ian Arawjo

2025-09-26

Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (published)

doi.org

arxiv.org

Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models

Alexander Htet Kyaw

Richa Gupta

Dhruv Shah

Anoop K. Sinha

Kory Mathewson

Stefanie Pender

Sachin Chitta

Yotto koga

Faez Ahmed

Lawrence Sass

Randall Davis

Advances in 3D generative AI have enabled the creation of physical objects from text prompts, but challenges remain in creating objects invo… (see more)lving multiple component types. We present a pipeline that integrates 3D generative AI with vision-language models (VLMs) to enable the robotic assembly of multi-component objects from natural language. Our method leverages VLMs for zero-shot, multi-modal reasoning about geometry and functionality to decompose AI-generated meshes into multi-component 3D models using predefined structural and panel components. We demonstrate that a VLM is capable of determining which mesh regions need panel components in addition to structural components based on object functionality. Evaluation across test objects shows that users preferred the VLM-generated assignments 90.6\% of the time, compared to 59.4\% for rule-based and 2.5\% for random assignment. Lastly, the system allows users to refine component assignments through conversational feedback, enabling greater human control and agency in making physical objects with generative AI and robotics.

2025-09-26

NeurIPS.cc/2025/Creative_AI_Track (published)

doi.org

openreview.net

$\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training

Aur'elien Buck-Kaeffer

Je Qin Chooi

Dan Zhao

Maximilian Puelma Touzel

Kellin Pelrine

Jean-François Godbout

Reihaneh Rabbany

Zachary Yang

Large language models (LLMs) offer promising capabilities for simulating social media dynamics at scale, enabling studies that would be ethi… (see more)cally or logistically challenging with human subjects. However, the field lacks standardized data resources for fine-tuning and evaluating LLMs as realistic social media agents. We address this gap by introducing SIMPACT, the SIMulation-oriented Persona and Action Capture Toolkit, a privacy respecting framework for constructing behaviorally-grounded social media datasets suitable for training agent models. We formulate next-action prediction as a task for training and evaluating LLM-based agents and introduce metrics at both the cluster and population levels to assess behavioral fidelity and stylistic realism. As a concrete implementation, we release BluePrint, a large-scale dataset built from public Bluesky data focused on political discourse. BluePrint clusters anonymized users into personas of aggregated behaviours, capturing authentic engagement patterns while safeguarding privacy through pseudonymization and removal of personally identifiable information. The dataset includes a sizable action set of 12 social media interaction types (likes, replies, reposts, etc.), each instance tied to the posting activity preceding it. This supports the development of agents that use context-dependence, not only in the language, but also in the interaction behaviours of social media to model social media users. By standardizing data and evaluation protocols, SIMPACT provides a foundation for advancing rigorous, ethically responsible social media simulations. BluePrint serves as both an evaluation benchmark for political discourse modeling and a template for building domain specific datasets to study challenges such as misinformation and polarization.

2025-09-26

ArXiv (preprint)

doi.org

arxiv.org

Active Attacks: Red-teaming LLMs via Adaptive Environments

Taeyoung YUN

Pierre-Luc St-Charles

Jinkyoo Park

Yoshua Bengio

Minsu Kim

2025-09-25

ArXiv (preprint)

doi.org

arxiv.org

Continual Pre-training of MoEs: How robust is your router?

Benjamin Therien

Charles-Etienne Joseph

Zain Sarwar

Ashwinee Panda

Anirban Das

Shi-Xiong Zhang

Stephen Rawls

Sambit Sahu

Eugene Belilovsky

Irina Rish

2025-09-25

TMLR (accepted)

doi.org

openreview.net

Investigating Faithfulness in Large Audio Language Models

Lovenya Jain

Pooneh Mousavi

Mirco Ravanaelli

Yusuf Cem Sübakan

Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliab… (see more)le explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.

2025-09-25

ArXiv (preprint)

doi.org

arxiv.org

Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study.

Peter Schellongowski

Michael Darmon

Philipp Eller

Laveena Munshi

Tobias Liebregts

Victoria Metaxa

Luca Montini

Tobias Lahmer

Andry Van de Louw

Martin Balik

Peter Pickkers

Pleun Hemelaar

Hemang Yadav

Andreas Barratt-Due

Thomas Karvunidis

Jordi Riera

Gennaro Martucci

Ignacio Martin-Loeches

Pedro Castro

Nina Buchtele … (see 24 more)

Virginie Lemiale

Stefan Hatzl

Guillaume Dumas

Thomas Staudinger

Elie Azoulay

Gottfried Gürkan Christian Elisabeth Alexis Gennaro Giovanna Heinz Sengölge Zauner Lobmeyr Maillard De Pascale

Gottfried Heinz

G. Sengölge

Christian Zauner

Elisabeth Lobmeyr

Alexis Maillard

G. De Pascale

G. Panarello

Philippe R. Bauer

M. Flaksa

Brozek

Fabio S. Taccone

I. Crippa

Andreas Barrat-Due

Sandra García-Roche

Cándido Díaz-Lagares

Andrés Pacheco

A. Téllez

I. Loeches

2025-09-24

Intensive Care Medicine (published)

doi.org

Chromatin landscape and enhancer-gene interaction differences between three cardiac cell types

Chukwuemeka George Anene-Nzelu

Yan Zhu

Jean‐Christophe Grenier

Raphaël Poujol

Svenja Koslowski

Olivier Tastet

Chang Jie Mick Lee

Matthew Ackers‐Johnson

Roger Foo

Julie Hussin

ABSTRACT Genome-wide association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNP) associated with a specific tr… (see more)aits and diseases, however, uncovering the true disease-relevant SNPs remains challenging. One limitation for prioritizing true disease-relevant SNPs from GWAS is that most of the identified SNPs are non-coding, making it difficult to unravel their mechanism of action. Nevertheless, mapping non-coding SNPs to enhancers is a validated approach to link SNPs to their target genes through the analysis of enhancer-gene interactions (EGI) and thus provide insight into their mechanism of action. While previous studies linking cardiac disease-relevant SNPs to enhancers and their target genes have focused on the principal cardiac cell type, cardiomyocytes (CMs), the analysis of other non-CM cell types has been largely ignored and has only gained attention recently. We hypothesize that characterizing cell-type-specific enhancer-gene interactions (EGIs) for these non-CMs, namely cardiac fibroblasts (CFs), endothelial cells (ECs), and smooth muscle cells (SMCs), followed by mapping cardiac-disease-associated non-coding SNPs to those enhancers will identify novel disease-relevant genes and provide insights for future mechanistic research. To identify the landscape of cell-type-specific EGIs in these cardiac cells, we have employed the activity-by-Contact (ABC) model. It integrates assay for transposase-accessible chromatin sequencing (ATAC-seq), H3K27ac chromatin immunoprecipitation with sequencing (ChIP-seq), and high-throughput chromosome conformation capture with H3K27ac immunoprecipitation (H3K27ac HiChIP) data to identify EGIs. We have identified the landscape of cell-type-specific EGIs in these cardiac cells. Furthermore, a higher similarity of the chromatin accessibility profile (ATAC-seq) between CF and SMC, compared to CF and EC, and SMC and EC was observed. Finally, overlapping identified EGIs with cardiac-disease-associated non-coding variants has allowed the identification of a QT-interval-associated SNP that is mapped to the enhancer region of an EC-specific EGI.

2025-09-24

bioRxiv (preprint)

doi.org

HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

Hiren Madhu

João Felipe Rocha

Tinglin Huang

Siddharth Viswanath

Smita Krishnaswamy

Rex Ying

2025-09-24

ArXiv (preprint)

doi.org

arxiv.org

Neither Valid Nor Reliable? Investigating the Use of LLMs as Judges

Khaoula Chehbouni

Mohammed Haddou

Jackie CK Cheung

Golnoosh Farnadi

2025-09-24

NeurIPS.cc/2025/Position_Paper_Track (accepted)

openreview.net

Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor

A.R. Olteanu

Su Lin Blodgett

Agathe Balayn

Angelina Wang

Fernando Diaz

Flavio Calmon

Margaret Mitchell

Michael Ekstrand

Reuben Binns

Solon Barocas

In AI research and practice, rigor remains largely understood in terms of methodological rigor -- such as whether mathematical, statistical,… (see more) or computational methods are correctly applied. We argue that this narrow conception of rigor has contributed to the concerns raised by the responsible AI community, including overblown claims about AI capabilities. Our position is that a broader conception of what rigorous AI research and practice should entail is needed. We believe such a conception -- in addition to a more expansive understanding of (1) methodological rigor -- should include aspects related to (2) what background knowledge informs what to work on (epistemic rigor); (3) how disciplinary, community, or personal norms, standards, or beliefs influence the work (normative rigor); (4) how clearly articulated the theoretical constructs under use are (conceptual rigor); (5) what is reported and how (reporting rigor); and (6) how well-supported the inferences from existing evidence are (interpretative rigor). In doing so, we also aim to provide useful language and a framework for much-needed dialogue about the AI community's work by researchers, policymakers, journalists, and other stakeholders.

2025-09-24

NeurIPS.cc/2025/Position_Paper_Track (accepted)

doi.org

openreview.net

Benchmarking Machine Learning Potentials for Crystal Structure Relaxation

Kowen Woo

Prashant Govindarajan

A. Chandar

High-throughput materials discovery workflows require rapid and accurate relaxation of crystal structures to identify thermodynamically stab… (see more)le phases among thousands to millions of candidate structures. Yet current machine learning interatomic potential (MLIP) benchmarks focus predominantly on energy prediction rather than structure relaxation, creating a critical evaluation gap for models designed to accelerate optimization. Additionally, these benchmarks are trained on datasets consisting mainly of known stable or near-stable materials, thus failing to capture the challenges of unexplored chemical spaces. We address these limitations by introducing a benchmark that evaluates state-of-the-art MLIPs and a one-shot relaxation model on structure relaxation with crystals generated via a reinforcement learning pipeline. We compare energy lowering and average maximum force computed via DFT, as well as relaxation runtime. We also contrast direct force-prediction strategies against conservative energy-differentiation approaches to determine which paradigm delivers superior relaxation performance. Our results indicate that there is a clear disconnect between MLIP energy prediction and force convergence in relaxation, challenging current benchmarking approaches.

2025-09-23

NeurIPS.cc/2025/Workshop/AI4Science (poster)

openreview.net

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Publications