Pedro Vianna

openreview.net

Unsupervised Test-Time Adaptation for Hepatic Steatosis Grading Using Ultrasound B-Mode Images

Michael Eickenberg

An Tang

Guy Cloutier

Ultrasound (US) is considered a key modality for the clinical assessment of hepatic steatosis (i.e., fatty liver) due to its noninvasiveness… (see more) and availability. Deep learning methods have attracted considerable interest in this field, as they are capable of learning patterns in a collection of images and achieve clinically comparable levels of accuracy in steatosis grading. However, variations in patient populations, acquisition protocols, equipment, and operator expertise across clinical sites can introduce domain shifts that reduce model performance when applied outside the original training setting. In response, unsupervised domain adaptation techniques are being investigated to address these shifts, allowing models to generalize more effectively across diverse clinical environments. In this work, we propose a test-time batch normalization (TTN) technique designed to handle domain shift, especially for changes in label distribution, by adapting selected features of batch normalization (BatchNorm) layers in a trained convolutional neural network model. This approach operates in an unsupervised manner, allowing robust adaptation to new distributions without access to label data. The method was evaluated on two abdominal US datasets collected at different institutions, assessing its capability in mitigating domain shift for hepatic steatosis classification. The proposed method reduced the mean absolute error in steatosis grading by 37% and improved the area under the receiver operating characteristic curves (AUC) for steatosis detection from 0.78 to 0.97, compared to nonadapted models. These findings demonstrate the potential of the proposed method to address domain shift in US-based hepatic steatosis diagnosis, minimizing risks associated with deploying trained models in various clinical settings.

2025-03-25

IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (published)

Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

Muawiz Chaudhary

Paria Mehrbod

An Tang

Guy Cloutier

Michael Eickenberg

Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the … (see more)data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters.

2025-02-16

Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

Simulating federated learning for steatosis detection using ultrasound images

Yue Qi

Alexandre Cadrin-Chênevert

Katleen Blanchet

Emmanuel Montagnon

Louis-Antoine Mullie

Guy Cloutier

Michael Chassé

An Tang

We aimed to implement four data partitioning strategies evaluated with four federated learning (FL) algorithms and investigate the impact of… (see more) data distribution on FL model performance in detecting steatosis using B-mode US images. A private dataset (153 patients; 1530 images) and a public dataset (55 patient; 550 images) were included in this retrospective study. The datasets contained patients with metabolic dysfunction-associated fatty liver disease (MAFLD) with biopsy-proven steatosis grades and control individuals without steatosis. We employed four data partitioning strategies to simulate FL scenarios and we assessed four FL algorithms. We investigated the impact of class imbalance and the mismatch between the global and local data distributions on the learning outcome. Classification performance was assessed with area under the receiver operating characteristic curve (AUC) on a separate test set. AUCs were 0.93 (95% CI 0.92, 0.94) for source-based partitioning scenario with FedAvg, 0.90 (95% CI 0.89, 0.91) for a centralized model, and 0.83 (95% CI 0.81, 0.85) for a model trained in a single-center scenario. When data was perfectly balanced on the global level and each site had an identical data distribution, the model yielded an AUC of 0.90 (95% CI 0.88, 0.92). When each site contained data exclusively from one single class, irrespective of the global data distribution, the AUC fell in the range of 0.34–0.70. FL applied to B-mode US images provide performance comparable to a centralized model and higher than single-center scenario. Global data imbalance and local data heterogeneity influenced the learning outcome.

2024-06-09

Scientific Reports (published)

Generalization of deep learning models for hepatic steatosis grading using B-mode ultrasound images

Yue Qi

Michael Chassé

An Tang

Guy Cloutier

Grayscale ultrasound remains a key modality for screening of hepatic steatosis due to its non-invasiveness and availability. While neural ne… (see more)tworks have shown promise in this field, their main drawback lies in their inability to generalize to diverse real-world settings. Variations in equipment, acquisition parameters, or population significantly affect model performance. Test-time adaptation, an unsupervised domain adaptation technique, overcomes these limitations by adjusting trained models during inference. Our retrospective study used two datasets collected in separate populations, with different scanners and protocols. We propose an adaptation method, using test-time batch normalization to selectively adjust BatchNorm layers based on test data for predicting steatosis grades. Comparing the non-adapted and adapted models, the mean absolute error (± standard deviation) in grading four severities of steatosis decreased from 0.92 ± 0.21 to 0.64 ± 0.22 . Specifically, for detection of steatosis the area under the curve increased from 0.76 ± 0.05 to 0.95 ± 0.02 when using the adapted model. Adapted models show promising results in improving performance compared to base models when testing data differ significantly from training data. Results suggest that the proposed method effectively addresses domain shift in diagnosing fatty liver using ultrasound images, reducing risks associated with deploying trained models.

2024-02-29

The Journal of the Acoustical Society of America (published)

Channel Selection for Test-Time Adaptation Under Distribution Shift

Muawiz Sajjad Chaudhary

An Tang

Guy Cloutier

Michael Eickenberg

To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust mod… (see more)els to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks by recalculating batch normalization statistics on test batches. However, in many practical applications this technique is vulnerable to label distribution shifts. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. We find that adapted models significantly improve the performance compared to the baseline models and counteract unknown label shifts.

2023-10-26

NeurIPS.cc/2023/Workshop/DistShift (poster)

openreview.net

Comparison of Radiologists and Deep Learning for US Grading of Hepatic Steatosis

Sara‐Ivana Calce

Pamela Boustros

Cassandra Larocque-Rigney

Laurent Patry-Beaudoin

Yi Hui Luo

Emre Aslan

John Marinos

Talal Alamri

Kim‐Nhien Vu

Jessica Murphy-Lavallée

Jean-Sébastien Billiard

Emmanuel Montagnon

Hongliang Li

Samuel Kadoury

Bich Nguyen

Shanel Gauthier

Benjamin Therien

Irina Rish

Eugene Belilovsky … (see 4 more)

Michael Chassé

Guy Cloutier

An Tang

Background Screening for nonalcoholic fatty liver disease (NAFLD) is suboptimal due to the subjective interpretation of US images. Purpose T… (see more)o evaluate the agreement and diagnostic performance of radiologists and a deep learning model in grading hepatic steatosis in NAFLD at US, with biopsy as the reference standard. Materials and Methods This retrospective study included patients with NAFLD and control patients without hepatic steatosis who underwent abdominal US and contemporaneous liver biopsy from September 2010 to October 2019. Six readers visually graded steatosis on US images twice, 2 weeks apart. Reader agreement was assessed with use of κ statistics. Three deep learning techniques applied to B-mode US images were used to classify dichotomized steatosis grades. Classification performance of human radiologists and the deep learning model for dichotomized steatosis grades (S0, S1, S2, and S3) was assessed with area under the receiver operating characteristic curve (AUC) on a separate test set. Results The study included 199 patients (mean age, 53 years ± 13 [SD]; 101 men). On the test set (n = 52), radiologists had fair interreader agreement (0.34 [95% CI: 0.31, 0.37]) for classifying steatosis grades S0 versus S1 or higher, while AUCs were between 0.49 and 0.84 for radiologists and 0.85 (95% CI: 0.83, 0.87) for the deep learning model. For S0 or S1 versus S2 or S3, radiologists had fair interreader agreement (0.30 [95% CI: 0.27, 0.33]), while AUCs were between 0.57 and 0.76 for radiologists and 0.73 (95% CI: 0.71, 0.75) for the deep learning model. For S2 or lower versus S3, radiologists had fair interreader agreement (0.37 [95% CI: 0.33, 0.40]), while AUCs were between 0.52 and 0.81 for radiologists and 0.67 (95% CI: 0.64, 0.69) for the deep learning model. Conclusion Deep learning approaches applied to B-mode US images provided comparable performance with human readers for detection and grading of hepatic steatosis. Published under a CC BY 4.0 license. Supplemental material is available for this article. See also the editorial by Tuthill in this issue.

2023-09-30

Radiology (unknown)

Automated liver segmentation and steatosis grading using deep learning on B-mode ultrasound images

Merve Kulbay

Pamela Boustros

Sara-Ivana Calce

Cassandra Larocque-Rigney

Laurent Patry-Beaudoin

Yi Hui Luo

Muawiz Chaudary

Samuel Kadoury

Bich Nguyen

Emmanuel Montagnon

Michael Chassé

An Tang

Guy Cloutier

Early detection of nonalcoholic fatty liver disease (NAFLD) is crucial to avoid further complications. Ultrasound is often used for screenin… (see more)g and monitoring of hepatic steatosis, however it is limited by the subjective interpretation of images. Computer assisted diagnosis could aid radiologists to achieve objective grading, and artificial intelligence approaches have been tested across various medical applications. In this study, we evaluated the performance of a two-stage hepatic steatosis detection deep learning framework, with a first step of liver segmentation and a subsequent step of hepatic steatosis classification. We evaluated the models on internal and external datasets, aiming to understand the generalizability of the framework. In the external dataset, our segmentation model achieved a Dice score of 0.92 (95% CI: 0.78, 1.00), and our classification model achieved an area under the receiver operating characteristic curve of 0.84 (95% CI: 0.79, 0.89). Our findings highlight the potential benefits of applying artificial intelligence models in NAFLD assessment.

2023-09-02

IUS (published)