Portrait of Vladimir Makarenkov

Vladimir Makarenkov

Affiliate Member
Full Professor, UQAM, Department of Computer Science
Research Topics
Clustering
Computational Biology
Deep Learning
Medical Machine Learning

Biography

Vladimir Makarenkov is a full professor and director of the graduate program in bioinformatics at Université du Québec à Montréal (UQAM). He holds a master's degree in applied mathematics from Lomonosov Moscow State University and a PhD in computer science and mathematics from the École des hautes études en sciences sociales (EHESS) in Paris. Before joining the computer science department at UQAM, he completed a three-year postdoctoral fellowship at the Digital Ecology Lab at Université de Montréal.

He is the author of 80 journal articles and 67 conference papers, and the recipient of the prestigious Simon Régnier Prize and the Chikio Hayashi Prize awarded by the International Society for Mathematical Classification. His research focuses on AI, bioinformatics and data mining. This encompasses the design and development of novel unsupervised and supervised machine learning methods, as well as the use of machine learning techniques, including clustering and deep learning, for the analysis of biological and biomedical data.

Makarenkov’s current research also involves the development of an automated recommendation system based on deep learning to recommend the best clustering algorithm for a given input dataset. Additionally, he is working on creating a generic machine learning model to define the concept of cluster, and on comparing various auto-encoding approaches and clustering algorithms to achieve better clustering results.

Publications

Soil microbiome prediction using traditional machine learning and deep learning models
Zahia Aouabed
Vincent Therrien
Mohamed Achraf Bouaoune
Mohammadreza Bakhtyari
Mohamed Hijri
The accuracy of macrobiological community predictions largely depends on the taxonomic scale considered. Nowadays, the applicability of such… (see more) predictions remains an important challenge when extended to microbial soil communities. This is not only due to the lack of reliable benchmark data, but also to a greater diversity of the soil microorganisms compared to other environments. In this study, we use six traditional machine learning regression models and one deep learning regressor to predict relative frequencies of bacterial and fungal communities within the soil microbiome based on environmental factors. We analyze the data from two publicly available soil microbiome datasets: (1) Data collected by Averill and co-authors and analyzed in a recent Nature Ecology and Evolution article, and (2) Data extracted from the NEON database, to estimate the composition of bacterial and fungal communities at the functional (i.e. functional group level) and taxonomic scales (i.e. phylum, class, order, family, and genus levels). Our findings suggest the presence of a general pattern across the observed taxonomic scales according to which the predictability of the soil microbiome increases with taxonomic scale. However, a notable exception occurs when machine learning models are applied to predict bacterial communities at the functional group level for Averill et al.’s data when all of them fail to provide accurate predictions results. The best overall results obtained include the value of the coefficient of determination
Similarity-based transfer learning with deep learning networks for accurate CRISPR-Cas9 off-target prediction.
Transfer learning has emerged as a powerful tool for enhancing predictive accuracy in complex tasks, particularly in scenarios where data is… (see more) limited or imbalanced. This study explores the use of similarity-based pre-evaluation as a methodology to identify optimal source datasets for transfer learning, addressing the dual challenge of efficient source-target dataset pairing and off-target prediction in CRISPR-Cas9, while existing transfer learning applications in the field of gene editing often lack a principled method for source dataset selection. We use cosine, Euclidean, and Manhattan distances to evaluate similarity between the source and target datasets used in our transfer learning experiments. Four deep learning network architectures, i.e. Multilayer Perceptron (MLP), Convolutional Neural Networks (CNNs), Feedforward Neural Networks (FNNs), and Recurrent Neural Networks (RNNs), and two traditional machine learning models, i.e. Logistic Regression (LR) and Random Forest (RF), were tested and compared in our simulations. The results suggest that similarity scores are reliable indicators for pre-selecting source datasets in CRISPR-Cas9 transfer learning experiments, with cosine distance proving to be a more effective dataset comparison metric than either Euclidean or Manhattan distances. An RNN-GRU, a 5-layer FNN, and two MLP variants provided the best overall prediction results in our simulations. By integrating similarity-based source pre-selection with machine learning outcomes, we propose a dual-layered framework that not only streamlines the transfer learning process but also significantly improves off-target prediction accuracy. The code and data used in this study are freely available at: https://github.com/dagrate/transferlearning_offtargets .
ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation
Mohammadreza Bakhtyari
Renato Cordeiro De Amorim
Towards an Interpretable Machine Learning Model for Predicting Antimicrobial Resistance
Mohamed Mediouni
Abdoulaye Banire Diallo
Quantifying antimicrobial resistance in food-producing animals in North America
Mohamed Mediouni
Abdoulaye Banire Diallo
The global misuse of antimicrobial medication has further exacerbated the problem of antimicrobial resistance (AMR), enriching the pool of g… (see more)enetic mechanisms previously adopted by bacteria to evade antimicrobial drugs. AMR can be either intrinsic or acquired. It can be acquired either by selective genetic modification or by horizontal gene transfer that allows microorganisms to incorporate novel genes from other organisms or environments into their genomes. To avoid an eventual antimicrobial mistreatment, the use of antimicrobials in farm animal has been recently reconsidered in many countries. We present a systematic review of the literature discussing the cases of AMR and the related restrictions applied in North American countries (including Canada, Mexico, and the USA). The Google Scholar, PubMed, Embase, Web of Science, and Cochrane databases were searched to find plausible information on antimicrobial use and resistance in food-producing animals, covering the time period from 2015 to 2024. A total of 580 articles addressing the issue of antibiotic resistance in food-producing animals in North America met our inclusion criteria. Different AMR rates, depending on the bacterium being observed, the antibiotic class being used, and the farm animal being considered, have been identified. We determined that the highest average AMR rates have been observed for pigs (60.63% on average), the medium for cattle (48.94% on average), and the lowest for poultry (28.43% on average). We also found that Cephalosporines, Penicillins, and Tetracyclines are the antibiotic classes with the highest average AMR rates (65.86%, 61.32%, and 58.82%, respectively), whereas the use of Sulfonamides and Quinolones leads to the lowest average AMR (21.59% and 28.07%, respectively). Moreover, our analysis of antibiotic-resistant bacteria shows that Streptococcus suis (S. suis) and S. auerus provide the highest average AMR rates (71.81% and 69.48%, respectively), whereas Campylobacter spp. provides the lowest one (29.75%). The highest average AMR percentage, 57.46%, was observed in Mexico, followed by Canada at 45.22%, and the USA at 42.25%, which is most probably due to the presence of various AMR control strategies, such as stewardship programs and AMR surveillance bodies, existing in Canada and the USA. Our review highlights the need for better strategies and regulations to control the spread of AMR in North America.
Improving clustering quality evaluation in noisy Gaussian mixtures
Renato Cordeiro De Amorim
BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging
Moloud Abdar
Mohammadreza Bakhtyari
Test-time augmentation (TTA) is a well-known technique employed during the testing phase of computer vision tasks. It involves aggregating m… (see more)ultiple augmented versions of input data. Combining predictions using a simple average formulation is a common and straightforward approach after performing TTA. This paper introduces a novel framework for optimizing TTA, called BayTTA (Bayesian-based TTA), which is based on Bayesian Model Averaging (BMA). First, we generate a model list associated with different variations of the input data created through TTA. Then, we use BMA to combine model predictions weighted by their respective posterior probabilities. Such an approach allows one to take into account model uncertainty, and thus to enhance the predictive performance of the related machine learning or deep learning model. We evaluate the performance of BayTTA on various public data, including three medical image datasets comprising skin cancer, breast cancer, and chest X-ray images and two well-known gene editing datasets, CRISPOR and GUIDE-seq. Our experimental results indicate that BayTTA can be effectively integrated into state-of-the-art deep learning models used in medical image analysis as well as into some popular pre-trained CNN models such as VGG-16, MobileNetV2, DenseNet201, ResNet152V2, and InceptionRes-NetV2, leading to the enhancement in their accuracy and robustness performance.
A self-attention-based CNN-Bi-LSTM model for accurate state-of-charge estimation of lithium-ion batteries
Assessing the emergence time of SARS-CoV-2 zoonotic spillover
Stéphane Samson
Étienne Lord
Inertia-Based Indices to Determine the Number of Clusters in K-Means: An Experimental Evaluation
Andrei Rykov
Renato Cordeiro De Amorim
Boris Mirkin
This paper gives an experimentally supported review and comparison of several indices based on the conventional K-means inertia criterion fo… (see more)r determining the number of clusters,
Cache-Efficient Dynamic Programming MDP Solver
Jaël Champagne Gareau
Guillaume Gosset
Éric Beaudry
Inferring multiple consensus trees and supertrees using clustering: a review
Gayane S. Barseghyan
Nadia Tahiri