Publications

Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Priyan Vaithilingam
Daniel Lee
Elena L. Glassman
Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models
Alexander Htet Kyaw
Richa Gupta
Dhruv Shah
Anoop K. Sinha
Kory Mathewson
Stefanie Pender
Sachin Chitta
Yotto koga
Faez Ahmed
Lawrence Sass
Randall Davis
Advances in 3D generative AI have enabled the creation of physical objects from text prompts, but challenges remain in creating objects invo… (see more)lving multiple component types. We present a pipeline that integrates 3D generative AI with vision-language models (VLMs) to enable the robotic assembly of multi-component objects from natural language. Our method leverages VLMs for zero-shot, multi-modal reasoning about geometry and functionality to decompose AI-generated meshes into multi-component 3D models using predefined structural and panel components. We demonstrate that a VLM is capable of determining which mesh regions need panel components in addition to structural components based on object functionality. Evaluation across test objects shows that users preferred the VLM-generated assignments 90.6\% of the time, compared to 59.4\% for rule-based and 2.5\% for random assignment. Lastly, the system allows users to refine component assignments through conversational feedback, enabling greater human control and agency in making physical objects with generative AI and robotics.
$\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training
Large language models (LLMs) offer promising capabilities for simulating social media dynamics at scale, enabling studies that would be ethi… (see more)cally or logistically challenging with human subjects. However, the field lacks standardized data resources for fine-tuning and evaluating LLMs as realistic social media agents. We address this gap by introducing SIMPACT, the SIMulation-oriented Persona and Action Capture Toolkit, a privacy respecting framework for constructing behaviorally-grounded social media datasets suitable for training agent models. We formulate next-action prediction as a task for training and evaluating LLM-based agents and introduce metrics at both the cluster and population levels to assess behavioral fidelity and stylistic realism. As a concrete implementation, we release BluePrint, a large-scale dataset built from public Bluesky data focused on political discourse. BluePrint clusters anonymized users into personas of aggregated behaviours, capturing authentic engagement patterns while safeguarding privacy through pseudonymization and removal of personally identifiable information. The dataset includes a sizable action set of 12 social media interaction types (likes, replies, reposts, etc.), each instance tied to the posting activity preceding it. This supports the development of agents that use context-dependence, not only in the language, but also in the interaction behaviours of social media to model social media users. By standardizing data and evaluation protocols, SIMPACT provides a foundation for advancing rigorous, ethically responsible social media simulations. BluePrint serves as both an evaluation benchmark for political discourse modeling and a template for building domain specific datasets to study challenges such as misinformation and polarization.
Active Attacks: Red-teaming LLMs via Adaptive Environments
Pierre-Luc St-Charles
Jinkyoo Park
Continual Pre-training of MoEs: How robust is your router?
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
Sambit Sahu
Investigating Faithfulness in Large Audio Language Models
Mirco Ravanaelli
Yusuf Cem Sübakan
Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliab… (see more)le explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.
Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study.
Peter Schellongowski
Michael Darmon
Philipp Eller
Laveena Munshi
Tobias Liebregts
Victoria Metaxa
Luca Montini
Tobias Lahmer
Andry Van de Louw
Martin Balik
Peter Pickkers
Pleun Hemelaar
Hemang Yadav
Andreas Barratt-Due
Thomas Karvunidis
Jordi Riera
Gennaro Martucci
Ignacio Martin-Loeches
Pedro Castro
Nina Buchtele … (see 24 more)
Virginie Lemiale
Stefan Hatzl
Thomas Staudinger
Elie Azoulay
Gottfried Gürkan Christian Elisabeth Alexis Gennaro Giovanna Heinz Sengölge Zauner Lobmeyr Maillard De Pascale
Gottfried Heinz
G. Sengölge
Christian Zauner
Elisabeth Lobmeyr
Alexis Maillard
G. De Pascale
G. Panarello
Philippe R. Bauer
M. Flaksa
Brozek
Fabio S. Taccone
I. Crippa
Andreas Barrat-Due
Sandra García-Roche
Cándido Díaz-Lagares
Andrés Pacheco
A. Téllez
I. Loeches
Chromatin landscape and enhancer-gene interaction differences between three cardiac cell types
Chukwuemeka George Anene-Nzelu
Yan Zhu
Jean‐Christophe Grenier
Raphaël Poujol
Svenja Koslowski
Olivier Tastet
Chang Jie Mick Lee
Matthew Ackers‐Johnson
Roger Foo
ABSTRACT Genome-wide association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNP) associated with a specific tr… (see more)aits and diseases, however, uncovering the true disease-relevant SNPs remains challenging. One limitation for prioritizing true disease-relevant SNPs from GWAS is that most of the identified SNPs are non-coding, making it difficult to unravel their mechanism of action. Nevertheless, mapping non-coding SNPs to enhancers is a validated approach to link SNPs to their target genes through the analysis of enhancer-gene interactions (EGI) and thus provide insight into their mechanism of action. While previous studies linking cardiac disease-relevant SNPs to enhancers and their target genes have focused on the principal cardiac cell type, cardiomyocytes (CMs), the analysis of other non-CM cell types has been largely ignored and has only gained attention recently. We hypothesize that characterizing cell-type-specific enhancer-gene interactions (EGIs) for these non-CMs, namely cardiac fibroblasts (CFs), endothelial cells (ECs), and smooth muscle cells (SMCs), followed by mapping cardiac-disease-associated non-coding SNPs to those enhancers will identify novel disease-relevant genes and provide insights for future mechanistic research. To identify the landscape of cell-type-specific EGIs in these cardiac cells, we have employed the activity-by-Contact (ABC) model. It integrates assay for transposase-accessible chromatin sequencing (ATAC-seq), H3K27ac chromatin immunoprecipitation with sequencing (ChIP-seq), and high-throughput chromosome conformation capture with H3K27ac immunoprecipitation (H3K27ac HiChIP) data to identify EGIs. We have identified the landscape of cell-type-specific EGIs in these cardiac cells. Furthermore, a higher similarity of the chromatin accessibility profile (ATAC-seq) between CF and SMC, compared to CF and EC, and SMC and EC was observed. Finally, overlapping identified EGIs with cardiac-disease-associated non-coding variants has allowed the identification of a QT-interval-associated SNP that is mapped to the enhancer region of an EC-specific EGI.
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
Hiren Madhu
João Felipe Rocha
Tinglin Huang
Rex Ying
Neither Valid Nor Reliable? Investigating the Use of LLMs as Judges
Mohammed Haddou
Jackie CK Cheung
Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor
A.R. Olteanu
Agathe Balayn
Angelina Wang
Flavio Calmon
Margaret Mitchell
Michael Ekstrand
Reuben Binns
Solon Barocas
In AI research and practice, rigor remains largely understood in terms of methodological rigor -- such as whether mathematical, statistical,… (see more) or computational methods are correctly applied. We argue that this narrow conception of rigor has contributed to the concerns raised by the responsible AI community, including overblown claims about AI capabilities. Our position is that a broader conception of what rigorous AI research and practice should entail is needed. We believe such a conception -- in addition to a more expansive understanding of (1) methodological rigor -- should include aspects related to (2) what background knowledge informs what to work on (epistemic rigor); (3) how disciplinary, community, or personal norms, standards, or beliefs influence the work (normative rigor); (4) how clearly articulated the theoretical constructs under use are (conceptual rigor); (5) what is reported and how (reporting rigor); and (6) how well-supported the inferences from existing evidence are (interpretative rigor). In doing so, we also aim to provide useful language and a framework for much-needed dialogue about the AI community's work by researchers, policymakers, journalists, and other stakeholders.
Benchmarking Machine Learning Potentials for Crystal Structure Relaxation
High-throughput materials discovery workflows require rapid and accurate relaxation of crystal structures to identify thermodynamically stab… (see more)le phases among thousands to millions of candidate structures. Yet current machine learning interatomic potential (MLIP) benchmarks focus predominantly on energy prediction rather than structure relaxation, creating a critical evaluation gap for models designed to accelerate optimization. Additionally, these benchmarks are trained on datasets consisting mainly of known stable or near-stable materials, thus failing to capture the challenges of unexplored chemical spaces. We address these limitations by introducing a benchmark that evaluates state-of-the-art MLIPs and a one-shot relaxation model on structure relaxation with crystals generated via a reinforcement learning pipeline. We compare energy lowering and average maximum force computed via DFT, as well as relaxation runtime. We also contrast direct force-prediction strategies against conservative energy-differentiation approaches to determine which paradigm delivers superior relaxation performance. Our results indicate that there is a clear disconnect between MLIP energy prediction and force convergence in relaxation, challenging current benchmarking approaches.