Vladimir Makarenkov

The accuracy of macrobiological community predictions largely depends on the taxonomic scale considered. Nowadays, the applicability of such… (see more) predictions remains an important challenge when extended to microbial soil communities. This is not only due to the lack of reliable benchmark data, but also to a greater diversity of the soil microorganisms compared to other environments. In this study, we use six traditional machine learning regression models and one deep learning regressor to predict relative frequencies of bacterial and fungal communities within the soil microbiome based on environmental factors. We analyze the data from two publicly available soil microbiome datasets: (1) Data collected by Averill and co-authors and analyzed in a recent Nature Ecology and Evolution article, and (2) Data extracted from the NEON database, to estimate the composition of bacterial and fungal communities at the functional (i.e. functional group level) and taxonomic scales (i.e. phylum, class, order, family, and genus levels). Our findings suggest the presence of a general pattern across the observed taxonomic scales according to which the predictability of the soil microbiome increases with taxonomic scale. However, a notable exception occurs when machine learning models are applied to predict bacterial communities at the functional group level for Averill et al.’s data when all of them fail to provide accurate predictions results. The best overall results obtained include the value of the coefficient of determination

2026-02-24

Scientific Reports (published)

Similarity-based transfer learning with deep learning networks for accurate CRISPR-Cas9 off-target prediction.

Jérémy Charlier

Zeinab Sherkatghanad

Transfer learning has emerged as a powerful tool for enhancing predictive accuracy in complex tasks, particularly in scenarios where data is… (see more) limited or imbalanced. This study explores the use of similarity-based pre-evaluation as a methodology to identify optimal source datasets for transfer learning, addressing the dual challenge of efficient source-target dataset pairing and off-target prediction in CRISPR-Cas9, while existing transfer learning applications in the field of gene editing often lack a principled method for source dataset selection. We use cosine, Euclidean, and Manhattan distances to evaluate similarity between the source and target datasets used in our transfer learning experiments. Four deep learning network architectures, i.e. Multilayer Perceptron (MLP), Convolutional Neural Networks (CNNs), Feedforward Neural Networks (FNNs), and Recurrent Neural Networks (RNNs), and two traditional machine learning models, i.e. Logistic Regression (LR) and Random Forest (RF), were tested and compared in our simulations. The results suggest that similarity scores are reliable indicators for pre-selecting source datasets in CRISPR-Cas9 transfer learning experiments, with cosine distance proving to be a more effective dataset comparison metric than either Euclidean or Manhattan distances. An RNN-GRU, a 5-layer FNN, and two MLP variants provided the best overall prediction results in our simulations. By integrating similarity-based source pre-selection with machine learning outcomes, we propose a dual-layered framework that not only streamlines the transfer learning process but also significantly improves off-target prediction accuracy. The code and data used in this study are freely available at: https://github.com/dagrate/transferlearning_offtargets .

2025-09-30

PLoS Computational Biology (published)

ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation

Mohammadreza Bakhtyari

Bogdan Mazoure

Renato Cordeiro De Amorim

Guillaume Rabusseau

2025-09-28

ArXiv (preprint)

Towards an Interpretable Machine Learning Model for Predicting Antimicrobial Resistance

Mohamed Mediouni

Abdoulaye Banire Diallo

2025-07-31

Journal of Global Antimicrobial Resistance (published)

Quantifying antimicrobial resistance in food-producing animals in North America

Mohamed Mediouni

Abdoulaye Banire Diallo

The global misuse of antimicrobial medication has further exacerbated the problem of antimicrobial resistance (AMR), enriching the pool of g… (see more)enetic mechanisms previously adopted by bacteria to evade antimicrobial drugs. AMR can be either intrinsic or acquired. It can be acquired either by selective genetic modification or by horizontal gene transfer that allows microorganisms to incorporate novel genes from other organisms or environments into their genomes. To avoid an eventual antimicrobial mistreatment, the use of antimicrobials in farm animal has been recently reconsidered in many countries. We present a systematic review of the literature discussing the cases of AMR and the related restrictions applied in North American countries (including Canada, Mexico, and the USA). The Google Scholar, PubMed, Embase, Web of Science, and Cochrane databases were searched to find plausible information on antimicrobial use and resistance in food-producing animals, covering the time period from 2015 to 2024. A total of 580 articles addressing the issue of antibiotic resistance in food-producing animals in North America met our inclusion criteria. Different AMR rates, depending on the bacterium being observed, the antibiotic class being used, and the farm animal being considered, have been identified. We determined that the highest average AMR rates have been observed for pigs (60.63% on average), the medium for cattle (48.94% on average), and the lowest for poultry (28.43% on average). We also found that Cephalosporines, Penicillins, and Tetracyclines are the antibiotic classes with the highest average AMR rates (65.86%, 61.32%, and 58.82%, respectively), whereas the use of Sulfonamides and Quinolones leads to the lowest average AMR (21.59% and 28.07%, respectively). Moreover, our analysis of antibiotic-resistant bacteria shows that Streptococcus suis (S. suis) and S. auerus provide the highest average AMR rates (71.81% and 69.48%, respectively), whereas Campylobacter spp. provides the lowest one (29.75%). The highest average AMR percentage, 57.46%, was observed in Mexico, followed by Canada at 45.22%, and the USA at 42.25%, which is most probably due to the presence of various AMR control strategies, such as stewardship programs and AMR surveillance bodies, existing in Canada and the USA. Our review highlights the need for better strategies and regulations to control the spread of AMR in North America.

2025-05-26

Frontiers in Microbiology (published)

Improving clustering quality evaluation in noisy Gaussian mixtures

Renato Cordeiro De Amorim

2025-02-28

ArXiv (preprint)

BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging

Zeinab Sherkatghanad

Moloud Abdar

Mohammadreza Bakhtyari

Test-time augmentation (TTA) is a well-known technique employed during the testing phase of computer vision tasks. It involves aggregating m… (see more)ultiple augmented versions of input data. Combining predictions using a simple average formulation is a common and straightforward approach after performing TTA. This paper introduces a novel framework for optimizing TTA, called BayTTA (Bayesian-based TTA), which is based on Bayesian Model Averaging (BMA). First, we generate a model list associated with different variations of the input data created through TTA. Then, we use BMA to combine model predictions weighted by their respective posterior probabilities. Such an approach allows one to take into account model uncertainty, and thus to enhance the predictive performance of the related machine learning or deep learning model. We evaluate the performance of BayTTA on various public data, including three medical image datasets comprising skin cancer, breast cancer, and chest X-ray images and two well-known gene editing datasets, CRISPOR and GUIDE-seq. Our experimental results indicate that BayTTA can be effectively integrated into state-of-the-art deep learning models used in medical image analysis as well as into some popular pre-trained CNN models such as VGG-16, MobileNetV2, DenseNet201, ResNet152V2, and InceptionRes-NetV2, leading to the enhancement in their accuracy and robustness performance.

2024-06-24

ArXiv (preprint)

A self-attention-based CNN-Bi-LSTM model for accurate state-of-charge estimation of lithium-ion batteries

Zeinab Sherkatghanad

Amin Ghazanfari

2024-04-30

Journal of Energy Storage (published)

Assessing the emergence time of SARS-CoV-2 zoonotic spillover

Stéphane Samson

Étienne Lord

2024-04-03

PLoS ONE (published)

Inertia-Based Indices to Determine the Number of Clusters in K-Means: An Experimental Evaluation

Andrei Rykov

Renato Cordeiro De Amorim

Boris Mirkin

This paper gives an experimentally supported review and comparison of several indices based on the conventional K-means inertia criterion fo… (see more)r determining the number of clusters,

2023-12-31

IEEE Access (published)

Cache-Efficient Dynamic Programming MDP Solver

Jaël Champagne Gareau

Guillaume Gosset

Éric Beaudry

2022-12-31

European Conference on Artificial Intelligence (published)

Inferring multiple consensus trees and supertrees using clustering: a review

Gayane S. Barseghyan

Nadia Tahiri

2022-12-31

ArXiv (preprint)