Aman Verma

Characterizing co-purchased food products with soda, fresh fruits, and fresh vegetables using loyalty card purchasing data in Montréal, Canada, 2015–2017

Hiroshi Mamiya

Kody Crowell

Catherine L. Mah

Amélie Quesnel-Vallée

Aman Verma

David L. Buckeridge

Foods are not purchased in isolation but are normally co-purchased with other food products. The patterns of co-purchasing associations acro… (voir plus)ss a large number of food products have been rarely explored to date. Knowledge of such co-purchasing patterns will help evaluate nutrition interventions that might affect the purchasing of multiple food items while providing insights about food marketing activities that target multiple food items simultaneously. To quantify the association of food products purchased with each of three food categories of public health importance: soda, fresh fruits and fresh vegetables using Association Rule Mining (ARM) followed by longitudinal regression analysis. We obtained transaction data containing grocery purchasing baskets (lists of purchased products) collected from loyalty club members in a major supermarket chain between 2015 and 2017 in Montréal, Canada. There were 72 food groups in these data. ARM was applied to identify food categories co-purchased with soda, fresh fruits, and fresh vegetables. A subset of co-purchasing associations identified by ARM was further tested by confirmatory logistic regression models controlling for potential confounders of the associations and correlated purchasing patterns within shoppers. We analyzed 1,692,716 baskets. Salty snacks showed the strongest co-purchasing association with soda (Relative Risk [RR] = 2.07, 95% Confidence Interval [CI]: 2.06, 2.09). Sweet snacks/candies (RR = 1.73, 95%CI: 1.72–1.74) and juices/drinks (RR:1.71, 95%CI:1.71–1.73) also showed strong co-purchasing associations with soda. Fresh vegetables and fruits showed considerably different patterns of co-purchasing associations from those of soda, with pre-made salad and stir fry showing a strong association (RR = 3.78, 95% CI:3.74–3.82 for fresh vegetables and RR = 2.79, 95%CI:2.76–2.81 for fresh fruits). The longitudinal regression analysis confirmed these associations after adjustment for the confounders, although the associations were weaker in magnitude. Quantifying the interdependence of food products within shopping baskets provides novel insights for developing nutrition surveillance and interventions targeting multiple food categories while motivating research to identify drivers of such co-purchasing. ARM is a useful analytical approach to identify such cross-food associations from retail transaction data when combined with confirmatory regression analysis to adjust for confounders of such associations.

2025-02-16

The International Journal of Behavioral Nutrition and Physical Activity (publié)

doi.org

Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model

Yuesong Zou

Ahmad Pesaranghader

Ziyang Song

Aman Verma

David L. Buckeridge

Yue Li

The rapid growth of electronic health record (EHR) datasets opens up promising opportunities to understand human diseases in a systematic wa… (voir plus)y. However, effective extraction of clinical knowledge from the EHR data has been hindered by its sparsity and noisy information. We present GAT-ETM, an end-to-end knowledge graph-based multimodal embedded topic model. GAT-ETM distills latent disease topics from EHR data by learning the embedding from a constructed medical knowledge graph. We applied GAT-ETM to a large-scale EHR dataset consisting of over 1 million patients. We evaluated its performance based on EHR reconstruction and drug imputation. GAT-ETM demonstrated superior performance over the alternative methods on both tasks. Moreover, our model learned clinically meaningful graph-informed embedding of the EHR codes. In additional, our model is also able to discover interpretable and accurate patient representations for patient stratification and drug recommendations. Our code is available at Anonymous GitHub.

2022-10-24

Scientific Reports (publié)

doi.org

MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record

Yuri Ahuja

Yuesong Zou

Aman Verma

David L Buckeridge

Yuemei Li

2022-09-30

Journal of Biomedical Informatics (publié)

doi.org

Automatic Phenotyping by a Seed-guided Topic Model.

Ziyang Song

Yuanyi Hu

Aman Verma

David L. Buckeridge

Yue Li

Electronic health records (EHRs) provide rich clinical information and the opportunities to extract epidemiological patterns to understand a… (voir plus)nd predict patient disease risks with suitable machine learning methods such as topic models. However, existing topic models do not generate identifiable topics each predicting a unique phenotype. One promising direction is to use known phenotype concepts to guide topic inference. We present a seed-guided Bayesian topic model called MixEHR-Seed with 3 contributions: (1) for each phenotype, we infer a dual-form of topic distribution: a seed-topic distribution over a small set of key EHR codes and a regular topic distribution over the entire EHR vocabulary; (2) we model age-dependent disease progression as Markovian dynamic topic priors; (3) we infer seed-guided multi-modal topics over distinct EHR data types. For inference, we developed a variational inference algorithm. Using MixEHR-Seed, we inferred 1569 PheCode-guided phenotype topics from an EHR database in Quebec, Canada covering 1.3 million patients for up to 20-year follow-up with 122 million records for 8539 and 1126 unique diagnostic and drug codes, respectively. We observed (1) accurate phenotype prediction by the guided topics, (2) clinically relevant PheCode-guided disease topics, (3) meaningful age-dependent disease prevalence. Source code is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Seed.

2022-08-13

Knowledge Discovery and Data Mining (publié)

doi.org

Modeling electronic health record data using a knowledge-graph-embedded topic model

Yuesong Zou

Ahmad Pesaranghader

Aman Verma

David L Buckeridge

Yuemei Li

The rapid growth of electronic health record (EHR) datasets opens up promising opportunities to understand human diseases in a systematic wa… (voir plus)y. However, effective extraction of clinical knowledge from the EHR data has been hindered by its sparsity and noisy information. We present KG-ETM, an end-to-end knowledge graph-based multimodal embedded topic model. KG-ETM distills latent disease topics from EHR data by learning the embedding from the medical knowledge graphs. We applied KG-ETM to a large-scale EHR dataset consisting of over 1 million patients. We evaluated its performance based on EHR reconstruction and drug imputation. KG-ETM demonstrated superior performance over the alternative methods on both tasks. Moreover, our model learned clinically meaningful graph-informed embedding of the EHR codes. In additional, our model is also able to discover interpretable and accurate patient representations for patient stratification and drug recommendations.

2022-06-02

ArXiv (prépublication)

doi.org

arxiv.org

Mortality trends and lengths of stay among hospitalized COVID-19 patients in Ontario and Québec (Canada): a population-based cohort study of the first three epidemic waves

Yiqing Xia

Huiting Ma

David L Buckeridge

Marc Brisson

Beate Sander

Adrienne Chan

Aman Verma

Iris Ganser

Nadine Kronfli

Sharmistha Mishra

Mathieu Maheu-Giroux

Epidemic waves of COVID-19 strained hospital resources. We describe temporal trends in mortality risk and length of stay in intensive cares … (voir plus)units (ICUs) among COVID-19 patients hospitalized through the first three epidemic waves in Canada. We used population-based provincial hospitalization data from Ontario and Québec to examine mortality risk and lengths of ICU stay. For each province, adjusted estimates were obtained using marginal standardization of logistic regression models, adjusting for patient-level characteristics and hospital-level determinants. Using all hospitalizations from Ontario (N=26,541) and Québec (N=23,857), we found that unadjusted in-hospital mortality risks peaked at 31% in the first wave and was lowest at the end of the third wave at 6-7%. This general trend remained after controlling for confounders. The odds of in-hospital mortality in the highest hospital occupancy quintile was 1.2 (95%CI: 1.0-1.4; Ontario) and 1.6 (95%CI: 1.3-1.9; Québec) times that of the lowest quintile. Variants of concerns were associated with an increased in-hospital mortality. Length of ICU stay decreased over time from a mean of 16 days (SD=18) to 15 days (SD=15) in the third wave but were consistently higher in Ontario than Québec by 3-6 days. In-hospital mortality risks and lengths of ICU stay declined over time in both provinces, despite changing patient demographics, suggesting that new therapeutics and treatment, as well as improved clinical protocols, could have contributed to this reduction. Continuous population-based monitoring of patient outcomes in an evolving epidemic is necessary for health system preparedness and response.

2021-12-07

medRxiv (prépublication)

doi.org

Supervised multi-specialist topic model with applications on large-scale electronic health record data

Ziyang Song

Xavier Sumba Toral

Yixin Xu

Aihua Liu

Liming Guo

Guido Powell

Aman Verma

David Buckeridge

Ariane Marelli

Yue Li

Motivation: Electronic health record (EHR) data provides a new venue to elucidate disease comorbidities and latent phenotypes for precision … (voir plus)medicine. To fully exploit its potential, a realistic data generative process of the EHR data needs to be modelled. We present MixEHR-S to jointly infer specialist-disease topics from the EHR data. As the key contribution, we model the specialist assignments and ICD-coded diagnoses as the latent topics based on patient's underlying disease topic mixture in a novel unified supervised hierarchical Bayesian topic model. For efficient inference, we developed a closed-form collapsed variational inference algorithm to learn the model distributions of MixEHR-S. We applied MixEHR-S to two independent large-scale EHR databases in Quebec with three targeted applications: (1) Congenital Heart Disease (CHD) diagnostic prediction among 154,775 patients; (2) Chronic obstructive pulmonary disease (COPD) diagnostic prediction among 73,791 patients; (3) future insulin treatment prediction among 78,712 patients diagnosed with diabetes as a mean to assess the disease exacerbation. In all three applications, MixEHR-S conferred clinically meaningful latent topics among the most predictive latent topics and achieved superior target prediction accuracy compared to the existing methods, providing opportunities for prioritizing high-risk patients for healthcare services. MixEHR-S source code and scripts of the experiments are freely available at https://github.com/li-lab-mcgill/mixehrS

2021-07-31

Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (publié)

doi.org

arxiv.org

Bayesian latent multi‐state modeling for nonequidistant longitudinal electronic health records

Yu Luo

David A. Stephens

Aman Verma

David L. Buckeridge

Large amounts of longitudinal health records are now available for dynamic monitoring of the underlying processes governing the observations… (voir plus). However, the health status progression across time is not typically observed directly: records are observed only when a subject interacts with the system, yielding irregular and often sparse observations. This suggests that the observed trajectories should be modeled via a latent continuous‐time process potentially as a function of time‐varying covariates. We develop a continuous‐time hidden Markov model to analyze longitudinal data accounting for irregular visits and different types of observations. By employing a specific missing data likelihood formulation, we can construct an efficient computational algorithm. We focus on Bayesian inference for the model: this is facilitated by an expectation‐maximization algorithm and Markov chain Monte Carlo methods. Simulation studies demonstrate that these approaches can be implemented efficiently for large data sets in a fully Bayesian setting. We apply this model to a real cohort where patients suffer from chronic obstructive pulmonary disease with the outcome being the number of drugs taken, using health care utilization indicators and patient characteristics as covariates.

2020-03-10

Biometrics (publié)

doi.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Aman Verma

Publications