Benjamin Fung

Fariborz Haghighat

2022-05-01

Energy and Buildings (published)

Learning Inter-Modal Correspondence and Phenotypes From Multi-Modal Electronic Health Records

Kejing Yin

William K. Cheung

Jonathan Poon

Non-negative tensor factorization has been shown a practical solution to automatically discover phenotypes from the electronic health record… (see more)s (EHR) with minimal human supervision. Such methods generally require an input tensor describing the inter-modal interactions to be pre-established; however, the correspondence between different modalities (e.g., correspondence between medications and diagnoses) can often be missing in practice. Although heuristic methods can be applied to estimate them, they inevitably introduce errors, and leads to sub-optimal phenotype quality. This is particularly important for patients with complex health conditions (e.g., in critical care) as multiple diagnoses and medications are simultaneously present in the records. To alleviate this problem and discover phenotypes from EHR with unobserved inter-modal correspondence, we propose the collective hidden interaction tensor factorization (cHITF) to infer the correspondence between multiple modalities jointly with the phenotype discovery. We assume that the observed matrix for each modality is marginalization of the unobserved inter-modal correspondence, which are reconstructed by maximizing the likelihood of the observed matrices. Extensive experiments conducted on the real-world MIMIC-III dataset demonstrate that cHITF effectively infers clinically meaningful inter-modal correspondence, discovers phenotypes that are more clinically relevant and diverse, and achieves better predictive performance compared with a number of state-of-the-art computational phenotyping models.

2022-01-01

IEEE Transactions on Knowledge and Data Engineering (published)

On the Effectiveness of Interpretable Feedforward Neural Network

Miles Q. Li

Adel Abusitta

Deep learning models have achieved state-of-the-art performance in many classification tasks. However, most of them cannot provide an explan… (see more)ation for their classification results. Machine learning models that are interpretable are usually linear or piecewise linear and yield inferior performance. Non-linear models achieve much better classification performance, but it is usually hard to explain their classification results. As a counter-example, an interpretable feedforward neural network (IFFNN) is proposed to achieve both high classification performance and interpretability for malware detection. If the IFFNN can perform well in a more flexible and general form for other classification tasks while providing meaningful explanations, it may be of great interest to the applied machine learning community. In this paper, we propose a way to generalize the interpretable feedforward neural network to multi-class classification scenarios and any type of feedforward neural networks, and evaluate its classification performance and interpretability on interpretable datasets. We conclude by finding that the generalized IFFNNs achieve comparable classification performance to their normal feedforward neural network counterparts and provide meaningful explanations. Thus, this kind of neural network architecture has great practical use.

2021-11-03

ArXiv (preprint)

The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution

Malik H. Altakrori

Jackie Cheung

2021-11-01

Findings of the Association for Computational Linguistics: EMNLP 2021 (published)

Trade-off Between Accuracy and Fairness of Data-driven Building and Indoor Environment Models: A Comparative Study of Pre-processing Methods

Ying Sun

Fariborz Haghighat

2021-10-01

Energy (published)

Trade-off Between Accuracy and Fairness of Data-driven Building and Indoor Environment Models: A Comparative Study of Pre-processing Methods

Ying Sun

Fariborz Haghighat

2021-10-01

Energy (published)

A Data Mining Analysis of Cross-Regional Study of Apparel Consumption

Osmud Rahman

2021-08-28

Comprehensible Science (published)

A Data Mining Analysis of Cross-Regional Study of Apparel Consumption

Osmud Rahman

2021-06-19

(published)

A Novel Neural Network-Based Malware Severity Classification System

Miles Q. Li

2021-01-01

International Conference on Software and Data Technologies (published)

A Novel Neural Network-Based Malware Severity Classification System

Miles Q. Li

2021-01-01

International Conference on Software and Data Technologies (published)

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Malik H. Altakrori

Jackie Cheung

Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researc… (see more)hers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by failure to capture authorship style, by the topic shift or by other factors. Motivated by this, we propose the topic confusion task, where we switch the author-topic conﬁg-uration between training and testing set. This setup allows us to probe errors in the attribution process. We investigate the accuracy and two error measures: one caused by the models’ confusion by the switch because the features capture the topics, and one caused by the features’ inability to capture the writing styles, leading to weaker models. By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process. We further show that combining them with word-level n - grams can outperform the state-of-the-art technique in the cross-topic scenario. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task, and are outperformed by simple n -gram features.

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Learning Inter-Modal Correspondence and Phenotypes From Multi-Modal Electronic Health Records

Kejing Yin

William K. Cheung

Jonathan Poon

Non-negative tensor factorization has been shown a practical solution to automatically discover phenotypes from the electronic health record… (see more)s (EHR) with minimal human supervision. Such methods generally require an input tensor describing the inter-modal interactions to be pre-established; however, the correspondence between different modalities (e.g., correspondence between medications and diagnoses) can often be missing in practice. Although heuristic methods can be applied to estimate them, they inevitably introduce errors, and leads to sub-optimal phenotype quality. This is particularly important for patients with complex health conditions (e.g., in critical care) as multiple diagnoses and medications are simultaneously present in the records. To alleviate this problem and discover phenotypes from EHR with unobserved inter-modal correspondence, we propose the collective hidden interaction tensor factorization (cHITF) to infer the correspondence between multiple modalities jointly with the phenotype discovery. We assume that the observed matrix for each modality is marginalization of the unobserved inter-modal correspondence, which are reconstructed by maximizing the likelihood of the observed matrices. Extensive experiments conducted on the real-world MIMIC-III dataset demonstrate that cHITF effectively infers clinically meaningful inter-modal correspondence, discovers phenotypes that are more clinically relevant and diverse, and achieves better predictive performance compared with a number of state-of-the-art computational phenotyping models.

2020-11-12

ArXiv (preprint)