Publications

Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity
Charles E. Gagnon
Steven H. H. Ding
Philippe Charland
Benjamin C. M. Fung
Planner Aware Path Learning in Diffusion Language Models Training
Fred Zhangzhi Peng
Zachary Bezemek
Shuibai Zhang
Anru R. Zhang
Michael M. Bronstein
Avishek Bose
Planning with Unified Multimodal Models
Zhilong Zhang
Yang Yu
With the powerful reasoning capabilities of large language models (LLMs) and vision-language models (VLMs), many recent works have explored … (voir plus)using them for decision-making. However, most of these approaches rely solely on language-based reasoning, which limits their ability to reason and make informed decisions. Recently, a promising new direction has emerged with unified multimodal models (UMMs), which support both multimodal inputs and outputs. We believe such models have greater potential for decision-making by enabling reasoning through generated visual content. To this end, we propose Uni-Plan, a planning framework built on UMMs. Within this framework, a single model simultaneously serves as the policy, dynamics model, and value function. In addition, to avoid hallucinations in dynamics predictions, we present a novel approach self-discriminated filtering, where the generative model serves as a self-discriminator to filter out invalid dynamics predictions. Experiments on long-horizon planning tasks show that Uni-Plan substantially improves success rates compared to VLM-based methods, while also showing strong data scalability, requiring no expert demonstrations and achieving better performance under the same training-data size. This work lays a foundation for future research in reasoning and decision-making with UMMs.
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling
Yann Batiste Pequignot
Ola Ahmad
Frédéric Precioso
Fine-tuning pretrained models is a standard and effective workflow in modern machine learning. However, robust fine-tuning (RFT), which aims… (voir plus) to simultaneously achieve adaptation to a downstream task and robustness to adversarial examples, remains challenging. Despite the abundance of non-robust pretrained models in open-source repositories, their potential for RFT is less understood. We address this knowledge gap by systematically examining RFT from such non-robust models. Our experiments reveal that fine-tuning non-robust models with a robust objective, even under small perturbations, can lead to poor performance, a phenomenon that we dub \emph{suboptimal transfer}. In challenging scenarios (eg, difficult tasks, high perturbation), the resulting performance can be so low that it may be considered a transfer failure. We find that fine-tuning using a robust objective impedes task adaptation at the beginning of training and eventually prevents optimal transfer. However, we propose a novel heuristic, \emph{Epsilon-Scheduling}, a schedule over perturbation strength used during training that promotes optimal transfer. Additionally, we introduce \emph{expected robustness}, a metric that captures performance across a range of perturbations, providing a more comprehensive evaluation of the accuracy-robustness trade-off for diverse models at test time. Extensive experiments on a wide range of configurations (six pretrained models and five datasets) show that \emph{Epsilon-Scheduling} successfully prevents \emph{suboptimal transfer} and consistently improves expected robustness.
Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Priyan Vaithilingam
Daniel Lee
Elena L. Glassman
Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models
Alexander Htet Kyaw
Richa Gupta
Dhruv Shah
Anoop K. Sinha
Kory Mathewson
Stefanie Pender
Sachin Chitta
Yotto koga
Faez Ahmed
Lawrence Sass
Randall Davis
Advances in 3D generative AI have enabled the creation of physical objects from text prompts, but challenges remain in creating objects invo… (voir plus)lving multiple component types. We present a pipeline that integrates 3D generative AI with vision-language models (VLMs) to enable the robotic assembly of multi-component objects from natural language. Our method leverages VLMs for zero-shot, multi-modal reasoning about geometry and functionality to decompose AI-generated meshes into multi-component 3D models using predefined structural and panel components. We demonstrate that a VLM is capable of determining which mesh regions need panel components in addition to structural components based on object functionality. Evaluation across test objects shows that users preferred the VLM-generated assignments 90.6\% of the time, compared to 59.4\% for rule-based and 2.5\% for random assignment. Lastly, the system allows users to refine component assignments through conversational feedback, enabling greater human control and agency in making physical objects with generative AI and robotics.
$\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training
Large language models (LLMs) offer promising capabilities for simulating social media dynamics at scale, enabling studies that would be ethi… (voir plus)cally or logistically challenging with human subjects. However, the field lacks standardized data resources for fine-tuning and evaluating LLMs as realistic social media agents. We address this gap by introducing SIMPACT, the SIMulation-oriented Persona and Action Capture Toolkit, a privacy respecting framework for constructing behaviorally-grounded social media datasets suitable for training agent models. We formulate next-action prediction as a task for training and evaluating LLM-based agents and introduce metrics at both the cluster and population levels to assess behavioral fidelity and stylistic realism. As a concrete implementation, we release BluePrint, a large-scale dataset built from public Bluesky data focused on political discourse. BluePrint clusters anonymized users into personas of aggregated behaviours, capturing authentic engagement patterns while safeguarding privacy through pseudonymization and removal of personally identifiable information. The dataset includes a sizable action set of 12 social media interaction types (likes, replies, reposts, etc.), each instance tied to the posting activity preceding it. This supports the development of agents that use context-dependence, not only in the language, but also in the interaction behaviours of social media to model social media users. By standardizing data and evaluation protocols, SIMPACT provides a foundation for advancing rigorous, ethically responsible social media simulations. BluePrint serves as both an evaluation benchmark for political discourse modeling and a template for building domain specific datasets to study challenges such as misinformation and polarization.
Active Attacks: Red-teaming LLMs via Adaptive Environments
Pierre-Luc St-Charles
Jinkyoo Park
Continual Pre-training of MoEs: How robust is your router?
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
Sambit Sahu
Investigating Faithfulness in Large Audio Language Models
Mirco Ravanaelli
Yusuf Cem Sübakan
Faithfulness measures whether chain-of-thought (CoT) representations accurately reflect a model's decision process and can be used as reliab… (voir plus)le explanations. Prior work has shown that CoTs from text-based LLMs are often unfaithful. This question has not been explored for large audio-language models (LALMs), where faithfulness is critical for safety-sensitive applications. Reasoning in LALMs is also more challenging, as models must first extract relevant clues from audio before reasoning over them. In this paper, we investigate the faithfulness of CoTs produced by several LALMs by applying targeted interventions, including paraphrasing, filler token injection, early answering, and introducing mistakes, on two challenging reasoning datasets: SAKURA and MMAR. After going through the aforementioned interventions across several datasets and tasks, our experiments suggest that, LALMs generally produce CoTs that appear to be faithful to their underlying decision processes.
Acute respiratory distress syndrome in patients with cancer: the YELENNA prospective multinational observational cohort study.
Peter Schellongowski
Michael Darmon
Philipp Eller
Laveena Munshi
Tobias Liebregts
Victoria Metaxa
Luca Montini
Tobias Lahmer
Andry Van de Louw
Martin Balik
Peter Pickkers
Pleun Hemelaar
Hemang Yadav
Andreas Barratt-Due
Thomas Karvunidis
Jordi Riera
Gennaro Martucci
Ignacio Martin-Loeches
Pedro Castro
Nina Buchtele … (voir 24 de plus)
Virginie Lemiale
Stefan Hatzl
Thomas Staudinger
Elie Azoulay
Gottfried Gürkan Christian Elisabeth Alexis Gennaro Giovanna Heinz Sengölge Zauner Lobmeyr Maillard De Pascale
Gottfried Heinz
G. Sengölge
Christian Zauner
Elisabeth Lobmeyr
Alexis Maillard
G. De Pascale
G. Panarello
Philippe R. Bauer
M. Flaksa
Brozek
Fabio S. Taccone
I. Crippa
Andreas Barrat-Due
Sandra García-Roche
Cándido Díaz-Lagares
Andrés Pacheco
A. Téllez
I. Loeches
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
Hiren Madhu
João Felipe Rocha
Tinglin Huang
Rex Ying