Publications

Path-filtering in path-integral simulations of open quantum systems using GFlowNets
Jeremy Lackman-Mincoff
Moksh J. Jain
Nikolay Malkin
Lena Simine
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
Martin Klissarov
Alexander T Toshev
Bogdan Mazoure
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Ge Ya Luo
Gian Favero
Zhi Hao Luo
Alexia Jolicoeur-Martineau
The Fr\'echet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectivene… (voir plus)ss relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average.
Efficient Design-and-Control Automation with Reinforcement Learning and Adaptive Exploration
Jiajun Fan
Hongyao Tang
Michael Przystupa
Mariano Phielipp
Santiago Miret
Seeking good designs is a central goal of many important domains, such as robotics, integrated circuits (IC), medicine, and materials scienc… (voir plus)e. These design problems are expensive, time-consuming, and traditionally performed by human experts. Moreover, the barriers to domain knowledge make it challenging to propose a universal solution that generalizes to different design problems. In this paper, we propose a new method called Efficient Design and Stable Control (EDiSon) for automatic design and control in different design problems. The key ideas of our method are (1) interactive sequential modeling of the design and control process and (2) adaptive exploration and design replay. To decompose the difficulty of learning design and control as a whole, we leverage sequential modeling for both the design process and control process, with a design policy to generate step-by-step design proposals and a control policy to optimize the objective by operating the design. With deep reinforcement learning (RL), the policies learn to find good designs by maximizing a reward signal that evaluates the quality of designs. Furthermore, we propose an adaptive exploration and replay mechanism based on a design memory that maintains high-quality designs generated so far. By regulating between constructing a design from scratch or replaying a design from memory to refine it, EDiSon balances the trade-off between exploration and exploitation in the design space and stabilizes the learning of the control policy. In the experiments, we evaluate our method in robotic morphology design and Tetris-based design tasks. Our framework has the potential to significantly accelerate the discovery of optimized designs across diverse domains, including automated materials discovery, by improving the exploration in design space while ensuring efficiency.
fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models
Weijia Xu
Nebojsa Jojic
HoneyComb: A Flexible LLM-Based Agent System for Materials Science
Huan Zhang
Yu Song
Ziyu Hou
Santiago Miret
The emergence of specialized large language models (LLMs) has shown promise in addressing complex tasks in materials science. Many LLMs, how… (voir plus)ever, often struggle with the distinct complexities of materials science tasks, such as computational challenges, and rely heavily on outdated implicit knowledge, leading to inaccuracies and hallucinations. To address these challenges, we introduce HoneyComb, the first LLM-based agent system specifically designed for materials science. HoneyComb leverages a reliable, high-quality materials science knowledge base (MatSciKB) and a sophisticated tool hub (ToolHub) tailored specifically for materials science to enhance its reasoning and computational capabilities. MatSciKB is a curated, structured knowledge collection based on reliable literature, while ToolHub employs an Inductive Tool Construction method to generate, decompose, and refine API tools for materials science. Additionally, HoneyComb leverages a retriever module that adaptively selects the appropriate knowledge source or tools for specific tasks, thereby ensuring accuracy and relevance. Our results demonstrate that HoneyComb significantly outperforms baseline models across various tasks in materials science, effectively bridging the gap between current LLM capabilities and the specialized needs of this domain. Furthermore, our adaptable framework can be easily extended to other scientific domains, highlighting its potential for broad applicability in advancing scientific research and applications.
MatExpert: Decomposing Materials Discovery By Mimicking Human Experts
Qianggang Ding
Santiago Miret
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
Daniel Levy
Siba Smarak Panigrahi
Sékou-Oumar Kaba
Qiang Zhu
Mikhail Galkin
Santiago Miret
Toward Debugging Deep Reinforcement Learning Programs with RLExplorer
Rached Bouchoucha
Ahmed Haj Yahmed
Darshan Patil
Janarthanan Rajendran
Amin Nikanjam
Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However… (voir plus), like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.
Multi-Objective Risk Assessment Framework for Exploration Planning Using Terrain and Traversability Analysis
Riana Gagnon Souleiman
Vivek Shankar Vardharajan
DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement
Qimin Chen
Zhiqin Chen
Vladimir Kim
Hao (Richard) Zhang
Hao Zhang 0002
Siddhartha Chaudhuri
Mitigating Downstream Model Risks via Model Provenance
Keyu Wang
Abdullah Norozi Iranzad
Scott Schaffter
Jonathan Lebensold
Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these … (voir plus)models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify the creation of model records, thereby reducing error-prone human effort, notably when a new model inherits most of its design from a foundation model. Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability across the model lifecycle. To facilitate the adoption, we introduce the unified model record (UMR) repository , a semantically versioned system that automates the publication of model records to multiple formats (PDF, HTML, LaTeX) and provides a hosted web interface (https://modelrecord.com/). This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.