Publications

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

Martin Klissarov

(Rex) Devon Hjelm

Alexander T Toshev

Bogdan Mazoure

2024-10-08

ArXiv (prépublication)

Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality

Ge Ya Luo

Gian Favero

Zhi Hao Luo

Alexia Jolicoeur-Martineau

Chris Pal

The Fr\'echet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectivene… (voir plus)ss relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average.

2024-10-07

ArXiv (prépublication)

Efficient Design-and-Control Automation with Reinforcement Learning and Adaptive Exploration

Jiajun Fan

Hongyao Tang

Michael Przystupa

Mariano Phielipp

Santiago Miret

Glen Berseth

Seeking good designs is a central goal of many important domains, such as robotics, integrated circuits (IC), medicine, and materials scienc… (voir plus)e. These design problems are expensive, time-consuming, and traditionally performed by human experts. Moreover, the barriers to domain knowledge make it challenging to propose a universal solution that generalizes to different design problems. In this paper, we propose a new method called Efficient Design and Stable Control (EDiSon) for automatic design and control in different design problems. The key ideas of our method are (1) interactive sequential modeling of the design and control process and (2) adaptive exploration and design replay. To decompose the difficulty of learning design and control as a whole, we leverage sequential modeling for both the design process and control process, with a design policy to generate step-by-step design proposals and a control policy to optimize the objective by operating the design. With deep reinforcement learning (RL), the policies learn to find good designs by maximizing a reward signal that evaluates the quality of designs. Furthermore, we propose an adaptive exploration and replay mechanism based on a design memory that maintains high-quality designs generated so far. By regulating between constructing a design from scratch or replaying a design from memory to refine it, EDiSon balances the trade-off between exploration and exploitation in the design space and stabilizes the learning of the control policy. In the experiments, we evaluate our method in robotic morphology design and Tetris-based design tasks. Our framework has the potential to significantly accelerate the discovery of optimized designs across diverse domains, including automated materials discovery, by improving the exploration in design space while ensuring efficiency.

2024-10-07

NeurIPS.cc/2024/Workshop/AI4Mat (publié)

fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models

Weijia Xu

Nebojsa Jojic

Nicolas Le Roux

2024-10-07

ArXiv (prépublication)

HoneyComb: A Flexible LLM-Based Agent System for Materials Science

Huan Zhang

Yu Song

Ziyu Hou

Santiago Miret

Bang Liu

The emergence of specialized large language models (LLMs) has shown promise in addressing complex tasks in materials science. Many LLMs, how… (voir plus)ever, often struggle with the distinct complexities of materials science tasks, such as computational challenges, and rely heavily on outdated implicit knowledge, leading to inaccuracies and hallucinations. To address these challenges, we introduce HoneyComb, the first LLM-based agent system specifically designed for materials science. HoneyComb leverages a reliable, high-quality materials science knowledge base (MatSciKB) and a sophisticated tool hub (ToolHub) tailored specifically for materials science to enhance its reasoning and computational capabilities. MatSciKB is a curated, structured knowledge collection based on reliable literature, while ToolHub employs an Inductive Tool Construction method to generate, decompose, and refine API tools for materials science. Additionally, HoneyComb leverages a retriever module that adaptively selects the appropriate knowledge source or tools for specific tasks, thereby ensuring accuracy and relevance. Our results demonstrate that HoneyComb significantly outperforms baseline models across various tasks in materials science, effectively bridging the gap between current LLM capabilities and the specialized needs of this domain. Furthermore, our adaptable framework can be easily extended to other scientific domains, highlighting its potential for broad applicability in advancing scientific research and applications.

2024-10-07

NeurIPS.cc/2024/Workshop/AI4Mat (spotlight)

MatExpert: Decomposing Materials Discovery By Mimicking Human Experts

Qianggang Ding

Santiago Miret

Bang Liu

2024-10-07

NeurIPS.cc/2024/Workshop/AI4Mat (publié)

SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models

Daniel Levy

Siba Smarak Panigrahi

Sékou-Oumar Kaba

Qiang Zhu

Mikhail Galkin

Santiago Miret

Siamak Ravanbakhsh

2024-10-07

NeurIPS.cc/2024/Workshop/AI4Mat (spotlight)

Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Rached Bouchoucha

Ahmed Haj Yahmed

Darshan Patil

Janarthanan Rajendran

Amin Nikanjam

Sarath Chandar

Foutse Khomh

Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However… (voir plus), like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.

2024-10-06

ArXiv (prépublication)

Multi-Objective Risk Assessment Framework for Exploration Planning Using Terrain and Traversability Analysis

Riana Gagnon Souleiman

Vivek Shankar Vardharajan

Giovanni Beltrame

2024-10-04

ArXiv (prépublication)

DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement

Qimin Chen

Zhiqin Chen

Vladimir Kim

Noam Aigerman

Hao (Richard) Zhang

Hao Zhang 0002

Siddhartha Chaudhuri

2024-10-03

Lecture Notes in Computer Science (publié)

Mitigating Downstream Model Risks via Model Provenance

Keyu Wang

Abdullah Norozi Iranzad

Scott Schaffter

Doina Precup

Jonathan Lebensold

Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these … (voir plus)models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify the creation of model records, thereby reducing error-prone human effort, notably when a new model inherits most of its design from a foundation model. Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability across the model lifecycle. To facilitate the adoption, we introduce the unified model record (UMR) repository , a semantically versioned system that automates the publication of model records to multiple formats (PDF, HTML, LaTeX) and provides a hosted web interface (https://modelrecord.com/). This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.

2024-10-03

ArXiv (prépublication)