Publications

Debiasing Counterfactuals in the Presence of Spurious Correlations

Raghav Mehta

Jean-Pierre R. Falet

Sotirios A. Tsaftaris

Deep learning models can perform well in complex medical imaging classification tasks, even when basing their conclusions on spurious correl… (see more)ations (i.e. confounders), should they be prevalent in the training dataset, rather than on the causal image markers of interest. This would thereby limit their ability to generalize across the population. Explainability based on counterfactual image generation can be used to expose the confounders but does not provide a strategy to mitigate the bias. In this work, we introduce the first end-to-end training framework that integrates both (i) popular debiasing classifiers (e.g. distributionally robust optimization (DRO)) to avoid latching onto the spurious correlations and (ii) counterfactual image generation to unveil generalizable imaging markers of relevance to the task. Additionally, we propose a novel metric, Spurious Correlation Latching Score (SCLS), to quantify the extent of the classifier reliance on the spurious correlation as exposed by the counterfactual images. Through comprehensive experiments on two public datasets (with the simulated and real visual artifacts), we demonstrate that the debiasing method: (i) learns generalizable markers across the population, and (ii) successfully ignores spurious correlations and focuses on the underlying disease pathology.

2023-10-09

Clinical Image-Based Procedures, Fairness of AI in Medical Imaging, and Ethical and Philosophical Issues in Medical Imaging (published)

doi.org

openreview.net

On the effectiveness of log representation for log-based anomaly detection

Xingfang Wu

Heng Li

Foutse Khomh

2023-10-09

Empirical Software Engineering (published)

doi.org

arxiv.org

Improving Image-Based Precision Medicine with Uncertainty-Aware Causal Models

Joshua D. Durso-Finley

Jean-Pierre R. Falet

Raghav Mehta

Douglas Arnold

Nick Pawlowski

Tal Arbel

Image-based precision medicine aims to personalize treatment decisions based on an individual's unique imaging features so as to improve the… (see more)ir clinical outcome. Machine learning frameworks that integrate uncertainty estimation as part of their treatment recommendations would be safer and more reliable. However, little work has been done in adapting uncertainty estimation techniques and validation metrics for precision medicine. In this paper, we use Bayesian deep learning for estimating the posterior distribution over factual and counterfactual outcomes on several treatments. This allows for estimating the uncertainty for each treatment option and for the individual treatment effects (ITE) between any two treatments. We train and evaluate this model to predict future new and enlarging T2 lesion counts on a large, multi-center dataset of MR brain images of patients with multiple sclerosis, exposed to several treatments during randomized controlled trials. We evaluate the correlation of the uncertainty estimate with the factual error, and, given the lack of ground truth counterfactual outcomes, demonstrate how uncertainty for the ITE prediction relates to bounds on the ITE error. Lastly, we demonstrate how knowledge of uncertainty could modify clinical decision-making to improve individual patient and clinical trial outcomes.

2023-10-08

OpenReview.net/Archive (published)

doi.org

openreview.net

MDFD: Study of Distributed Non-IID Scenarios and Frechet Distance-Based Evaluation

Wei Wang

Mingwei Zhang

Ziwen Wu

Qianxi Chen

Yue Li

With the development of distributed machine learning and federated learning, the solution to the data island problem is promoted. People use… (see more) computer clusters to train machine learning models on data distributed in different regions. In the early stage of research, researchers usually assume that the data sets of each node are independent identically distribution (IID), but this is a strong assumption, which is challenging to meet in practical applications. Therefore, research on non-IID has become a hot spot in recent years. However, there is no uniform standard for designing and evaluating non-IID scenarios. This paper proposes a Frechet distance-independent non-IID distribution dataset metric MDFD. And we conducted experiments on different types of distributed machine-learning methods in different non-IID scenarios to verify the effectiveness of MDFD.

2023-10-08

International Conference on Information Photonics (published)

doi.org

Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis

Changjian Shui

Justin Szeto

Raghav Mehta

Douglas Arnold

Tal Arbel

2023-10-08

OpenReview.net/Archive (published)

doi.org

openreview.net

SDWD: Style Diversity Weighted Distance Evaluates the Intra-Class Data Diversity of Distributed GANs

Wei Wang

Ziwen Wu

Mingwei Zhang

Yue Li

2023-10-08

2023 IEEE International Conference on Image Processing (ICIP) (published)

doi.org

Better Quality Pre-training Data and T5 Models for African Languages

Akintunde Oladipo

Mofetoluwa Adeyemi

Orevaoghene Ahia

Abraham Toluwase Owodunni

Odunayo Ogundepo

David Ifeoluwa Adelani

Jimmy Lin

In this study, we highlight the importance of enhancing the quality of pretraining data in multilingual language models. Existing web crawl… (see more)s have demonstrated quality issues, particularly in the context of low-resource languages. Consequently, we introduce a new multilingual pretraining corpus for

2023-10-07

EMNLP/2023/Conference (accepted)

doi.org

openreview.net

Crystal-GFN: sampling crystals with desirable properties and constraints

Alex Hernandez-Garcia

Alexandre AGM Duval

Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such … (see more)as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.

2023-10-07

ArXiv (preprint)

doi.org

arxiv.org

Crystal-GFN: sampling crystals with desirable properties and constraints

Alex Hernandez-Garcia

Alexandre AGM Duval

Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such … (see more)as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.

2023-10-07

ArXiv (preprint)

doi.org

arxiv.org

Driving into the Loop: Mapping Automation Bias and Liability Issues for Advanced Driver Assistance Systems

Katie Szilagyi

Jason Millar

AJung Moon

Shalaleh Rismani

2023-10-07

Digital Society (published)

doi.org

Efficient Classification of Long Documents via State-Space Models

Peng Lu

Suyuchen Wang

Mehdi Rezagholizadeh

Bang Liu

Ivan Kobyzev

2023-10-07

EMNLP/2023/Conference (accepted)

openreview.net

EpiK-Eval: Evaluation for Language Models as Epistemic Models

Gabriele Prato

Jerry Huang

Prasanna Parthasarathi

Shagun Sodhani

Sarath Chandar

In the age of artificial intelligence, the role of large language models (LLMs) is becoming increasingly central. Despite their growing prev… (see more)alence, their capacity to consolidate knowledge from different training documents—a crucial ability in numerous applications—remains unexplored. This paper presents the first study examining the capability of LLMs to effectively combine such information within their parameter space. We introduce EpiK-Eval, a novel question-answering benchmark tailored to evaluate LLMs' proficiency in formulating a coherent and consistent knowledge representation from segmented narratives. Evaluations across various LLMs reveal significant weaknesses in this domain. We contend that these shortcomings stem from the intrinsic nature of prevailing training objectives. Consequently, we advocate for refining the approach towards knowledge consolidation, as it harbors the potential to dramatically improve their overall effectiveness and performance. The findings from this study offer insights for developing more robust and reliable LLMs. Our code and benchmark are available at https://github.com/chandar-lab/EpiK-Eval

2023-10-07

EMNLP/2023/Conference (accepted)

doi.org

openreview.net

Speed Science

Leading in a New Era

Supervision Requests

Publications

Speed Science

Leading in a New Era

Supervision Requests

Popular keywords:

Publications