Publications

Improved Off-policy Reinforcement Learning in Biological Sequence Design

Minsu Kim

Alex Hernandez-Garcia

Jinkyoo Park

Designing biological sequences with desired properties is a significant challenge due to the combinatorially vast search space and the high … (voir plus)cost of evaluating each candidate sequence. To address these challenges, reinforcement learning (RL) methods, such as GFlowNets, utilize proxy models for rapid reward evaluation and annotated data for policy training. Although these approaches have shown promise in generating diverse and novel sequences, the limited training data relative to the vast search space often leads to the misspecification of proxy for out-of-distribution inputs. We introduce

2024-10-06

ArXiv (prépublication)

doi.org

arxiv.org

Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Rached Bouchoucha

Ahmed Haj Yahmed

Darshan Patil

Janarthanan Rajendran

Amin Nikanjam

Sarath Chandar

Foutse Khomh

Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However… (voir plus), like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.

2024-10-06

ArXiv (prépublication)

doi.org

arxiv.org

Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Rached Bouchoucha

Ahmed Haj Yahmed

Darshan Patil

Janarthanan Rajendran

Amin Nikanjam

Sarath Chandar

Foutse Khomh

Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However… (voir plus), like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.

2024-10-06

2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) (publié)

doi.org

arxiv.org

Understanding Web Application Workloads and Their Applications: Systematic Literature Review and Characterization

Roozbeh Aghili

Qiaolin Qin

Heng Li

Foutse Khomh

2024-10-06

2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) (publié)

doi.org

arxiv.org

Assessment of the Climate Trace global powerplant CO2 emissions

Kevin R. Gurney

Bilal Aslam

Pawlok Dass

Lech Gawuc

Toby Dylan Hocking

Jarrett J Barber

Anna Kato

Accurate estimation of planetary greenhouse gas (GHG) emissions at the scale of individual emitting activities is a critical need for both s… (voir plus)cientific and policy applications. Powerplants represent the single largest and most concentrated form of global GHG emissions. Climate Trace, co-founded and promoted by former U.S. Vice President Al Gore, is a new effort using, in part, artificial intelligence (AI) approaches to estimate asset-scale GHG emissions. Climate Trace recently released a database of global powerplant CO2 emissions at the facility-scale that uses both AI and non-AI estimation approaches. However, no independent peer-reviewed assessment has been made of this important global emissions database. Here, we compare the Climate Trace powerplant CO2 emissions to an atmospherically calibrated, multi-constraint estimate of powerplant CO2 emissions in the United States. The 3.7% (65) of compared facilities that used an AI-based approach show a mean relative difference (MRD) of −1.1% (SD: 46.4%) in the year 2019. The 96.3% (1726) of the facilities that used a non-AI-based approach show a MRD of −50.0% (SD: 117.7%). Of the non-AI estimated facilities, 151 (8.7%) facilities agree to within ±20%. The large differences between Climate Trace and Vulcan-power emission estimates for these facilities is primarily caused by Climate Trace’ use of a national-mean power plant capacity factor (CF) which is a poor representation of the reported power plant CFs of individual US facilities and leads to very large errors at those same 1726 facilities.

2024-10-04

Environmental Research Letters (publié)

doi.org

DiffKillR: Killing and Recreating Diffeomorphisms for Cell Annotation in Dense Microscopy Images

Chen Liu

Danqi Liao

Alejandro Parada-Mayorga

Alejandro Ribeiro

Marcello DiStasio

Smita Krishnaswamy

The proliferation of digital microscopy images, driven by advances in automated whole slide scanning, presents significant opportunities for… (voir plus) biomedical research and clinical diagnostics. However, accurately annotating densely packed information in these images remains a major challenge. To address this, we introduce DiffKillR, a novel framework that reframes cell annotation as the combination of archetype matching and image registration tasks. DiffKillR employs two complementary neural networks: one that learns a diffeomorphism-invariant feature space for robust cell matching and another that computes the precise warping field between cells for annotation mapping. Using a small set of annotated archetypes, DiffKillR efficiently propagates annotations across large microscopy images, reducing the need for extensive manual labeling. More importantly, it is suitable for any type of pixel-level annotation. We will discuss the theoretical properties of DiffKillR and validate it on three microscopy tasks, demonstrating its advantages over existing supervised, semi-supervised, and unsupervised methods.

2024-10-04

ArXiv (prépublication)

doi.org

arxiv.org

DiffKillR: Killing and Recreating Diffeomorphisms for Cell Annotation in Dense Microscopy Images

Chen Liu

Danqi Liao

Alejandro Parada-Mayorga

Alejandro Ribeiro

Marcello DiStasio

Smita Krishnaswamy

The proliferation of digital microscopy images, driven by advances in automated whole slide scanning, presents significant opportunities for… (voir plus) biomedical research and clinical diagnostics. However, accurately annotating densely packed information in these images remains a major challenge. To address this, we introduce DiffKillR, a novel framework that reframes cell annotation as the combination of archetype matching and image registration tasks. DiffKillR employs two complementary neural networks: one that learns a diffeomorphism-invariant feature space for robust cell matching and another that computes the precise warping field between cells for annotation mapping. Using a small set of annotated archetypes, DiffKillR efficiently propagates annotations across large microscopy images, reducing the need for extensive manual labeling. More importantly, it is suitable for any type of pixel-level annotation. We will discuss the theoretical properties of DiffKillR and validate it on three microscopy tasks, demonstrating its advantages over existing supervised, semi-supervised, and unsupervised methods.

2024-10-04

ArXiv (prépublication)

doi.org

arxiv.org

Multi-Objective Risk Assessment Framework for Exploration Planning Using Terrain and Traversability Analysis

Riana Gagnon Souleiman

Vivek Shankar Vardharajan

Giovanni Beltrame

2024-10-04

ArXiv (prépublication)

doi.org

arxiv.org

DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement

Qimin Chen

Zhiqin Chen

Vladimir Kim

Noam Aigerman

Hao (Richard) Zhang

Hao Zhang 0002

Siddhartha Chaudhuri

2024-10-03

Lecture Notes in Computer Science (publié)

doi.org

arxiv.org

Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs

Joshua D. Durso-Finley

Berardino Barile

Jean-Pierre R. Falet

Douglas Arnold

Nick Pawlowski

Tal Arbel

Personalized medicine based on medical images, including predicting future individualized clinical disease progression and treatment respons… (voir plus)e, would have an enormous impact on healthcare and drug development, particularly for diseases (e.g. multiple sclerosis (MS)) with long term, complex, heterogeneous evolutions and no cure. In this work, we present the first stochastic causal temporal framework to model the continuous temporal evolution of disease progression via Neural Stochastic Differential Equations (NSDE). The proposed causal inference model takes as input the patient's high dimensional images (MRI) and tabular data, and predicts both factual and counterfactual progression trajectories on different treatments in latent space. The NSDE permits the estimation of high-confidence personalized trajectories and treatment effects. Extensive experiments were performed on a large, multi-centre, proprietary dataset of patient 3D MRI and clinical data acquired during several randomized clinical trials for MS treatments. Our results present the first successful uncertainty-based causal Deep Learning (DL) model to: (a) accurately predict future patient MS disability evolution (e.g. EDSS) and treatment effects leveraging baseline MRI, and (b) permit the discovery of subgroups of patients for which the model has high confidence in their response to treatment even in clinical trials which did not reach their clinical endpoints.

2024-10-03

Lecture Notes in Computer Science (publié)

doi.org

arxiv.org

Sparse Bayesian Networks: Efficient Uncertainty Quantification in Medical Image Analysis

Zeinab Abboud

Hervé Lombaert

Samuel Kadoury

Efficiently quantifying predictive uncertainty in medical images remains a challenge. While Bayesian neural networks (BNN) offer predictive … (voir plus)uncertainty, they require substantial computational resources to train. Although Bayesian approximations such as ensembles have shown promise, they still suffer from high training and inference costs. Existing approaches mainly address the costs of BNN inference post-training, with little focus on improving training efficiency and reducing parameter complexity. This study introduces a training procedure for a sparse (partial) Bayesian network. Our method selectively assigns a subset of parameters as Bayesian by assessing their deterministic saliency through gradient sensitivity analysis. The resulting network combines deterministic and Bayesian parameters, exploiting the advantages of both representations to achieve high task-specific performance and minimize predictive uncertainty. Demonstrated on multi-label ChestMNIST for classification and ISIC, LIDC-IDRI for segmentation, our approach achieves competitive performance and predictive uncertainty estimation by reducing Bayesian parameters by over 95\%, significantly reducing computational expenses compared to fully Bayesian and ensemble methods.

2024-10-03

Lecture Notes in Computer Science (publié)

doi.org

arxiv.org

Top-down feedback matters: Functional impact of brainlike connectivity motifs on audiovisual integration

Mashbayar Tugsbayar

Mingze Li

Eilif Benjamin Muller

Blake Richards

2024-10-03

bioRxiv (prépublication)

doi.org