Benjamin Fung

Philippe Charland

To combat the increasingly versatile and mutable modern malware, Machine Learning (ML) is now a popular and effective complement to the exis… (voir plus)ting signature-based techniques for malware triage and identification. However, ML is also a readily available tool for adversaries. Recent studies have shown that malware can be modified by deep Reinforcement Learning (RL) techniques to bypass AI-based and signature-based anti-virus systems without altering their original malicious functionalities. These studies only focus on generating evasive samples and assume a static detection system as the enemy.Malware detection and evasion essentially form a two-party cat-and-mouse game. Simulating the real-life scenarios, in this paper we present the first two-player competitive game for evasive malware detection and generation, following the zero-sum Multi-Agent Reinforcement Learning (MARL) paradigm. Our experiments on recent malware show that the produced malware detection agent is more robust against adversarial attacks. Also, the produced malware modification agent is able to generate more evasive samples fooling both AI-based and other anti-malware techniques.

2022-07-27

Computer Science Symposium in Russia (publié)

On the Effectiveness of Interpretable Feedforward Neural Network

Miles Q. Li

Adel Abusitta

Deep learning models have achieved state-of-the-art performance in many classification tasks. However, most of them cannot provide an explan… (voir plus)ation for their classification results. Machine learning models that are interpretable are usually linear or piecewise linear and yield inferior performance. Non-linear models achieve much better classification performance, but it is usually hard to explain their classification results. As a counter-example, an interpretable feedforward neural network (IFFNN) is proposed to achieve both high classification performance and interpretability for malware detection. If the IFFNN can perform well in a more flexible and general form for other classification tasks while providing meaningful explanations, it may be of great interest to the applied machine learning community. In this paper, we propose a way to generalize the interpretable feedforward neural network to multi-class classification scenarios and any type of feedforward neural networks, and evaluate its classification performance and interpretability on interpretable datasets. We conclude by finding that the generalized IFFNNs achieve comparable classification performance to their normal feedforward neural network counterparts and provide meaningful explanations. Thus, this kind of neural network architecture has great practical use.

2022-07-18

2022 International Joint Conference on Neural Networks (IJCNN) (publié)

Interpretable Malware Classification based on Functional Analysis

Miles Q. Li

2022-07-11

Proceedings of the 17th International Conference on Software Technologies (publié)

Incentivized Security-Aware Computation Offloading for Large-Scale Internet of Things Applications

Talal Halabi

Adel Abusitta

Glaucio H.S. Carvalho

2022-07-05

2022 7th International Conference on Smart and Sustainable Technologies (SpliTech) (publié)

DyAdvDefender: An instance-based online machine learning model for perturbation-trial-based black-box adversarial defense

Miles Q. Li

Philippe Charland

2022-07-01

Information Sciences (publié)

JARV1S: Phenotype Clone Search for Rapid Zero-Day Malware Triage and Functional Decomposition for Cyber Threat Intelligence

Christopher Molloy

Philippe Charland

Steven H. H. Ding

Cyber threat intelligence (CTI) has become a critical component of the defense of organizations against the steady surge of cyber attacks. M… (voir plus)alware is one of the most challenging problems for CTI, due to its prevalence, the massive number of variants, and the constantly changing threat actor behaviors. Currently, Malpedia has indexed 2,390 unique malware families, while the AVTEST Institute has recorded more than 166 million new unique malware samples in 2021. There exists a vast number of variants per malware family. Consequently, the signature-based representation of patterns and knowledge of legacy systems can no longer be generalized to detect future malware attacks. Machine learning-based solutions can match more variants. However, as a black-box approach, they lack the explainability and maintainability required by incident response teams.There is thus an urgent need for a data-driven system that can abstract a future-proof, human-friendly, systematic, actionable, and dependable knowledge representation from software artifacts from the past for more effective and insightful malware triage. In this paper, we present the first phenotype-based malware decomposition system for quick malware triage that is effective against malware variants. We define phenotypes as directly observable characteristics such as code fragments, constants, functions, and strings. Malware development rarely starts from scratch, and there are many reused components and code fragments. The target under investigation is decomposed into known phenotypes that are mapped to known malware families, malware behaviors, and Advanced Persistent Threat (APT) groups. The implemented system provides visualizable phenotypes through an interactive tree map, helping the cyber analysts to navigate through the decomposition results. We evaluated our system on 200,000 malware samples, 100,000 benign samples, and a malware family with over 27,284 variants. The results indicate our system is scalable, efficient, and effective against zero-day malware and new variants of known families.

2022-05-31

2022 14th International Conference on Cyber Conflict: Keep Moving! (CyCon) (publié)

The generalizability of pre-processing techniques on the accuracy and fairness of data-driven building models: a case study

Ying Sun

Fariborz Haghighat

2022-05-01

Energy and Buildings (published)

Distinguishing between fake news and satire with transformers

Jwen Fai Low

Farkhund Iqbal

Shih-Chia Huang

2022-01-01

Expert systems with applications (publié)

Learning Inter-Modal Correspondence and Phenotypes From Multi-Modal Electronic Health Records

Kejing Yin

William K. Cheung

Jonathan Poon

Non-negative tensor factorization has been shown a practical solution to automatically discover phenotypes from the electronic health record… (voir plus)s (EHR) with minimal human supervision. Such methods generally require an input tensor describing the inter-modal interactions to be pre-established; however, the correspondence between different modalities (e.g., correspondence between medications and diagnoses) can often be missing in practice. Although heuristic methods can be applied to estimate them, they inevitably introduce errors, and leads to sub-optimal phenotype quality. This is particularly important for patients with complex health conditions (e.g., in critical care) as multiple diagnoses and medications are simultaneously present in the records. To alleviate this problem and discover phenotypes from EHR with unobserved inter-modal correspondence, we propose the collective hidden interaction tensor factorization (cHITF) to infer the correspondence between multiple modalities jointly with the phenotype discovery. We assume that the observed matrix for each modality is marginalization of the unobserved inter-modal correspondence, which are reconstructed by maximizing the likelihood of the observed matrices. Extensive experiments conducted on the real-world MIMIC-III dataset demonstrate that cHITF effectively infers clinically meaningful inter-modal correspondence, discovers phenotypes that are more clinically relevant and diverse, and achieves better predictive performance compared with a number of state-of-the-art computational phenotyping models.

2022-01-01

IEEE Transactions on Knowledge and Data Engineering (publié)

On the Effectiveness of Interpretable Feedforward Neural Network

Miles Q. Li

Adel Abusitta

Deep learning models have achieved state-of-the-art performance in many classification tasks. However, most of them cannot provide an explan… (voir plus)ation for their classification results. Machine learning models that are interpretable are usually linear or piecewise linear and yield inferior performance. Non-linear models achieve much better classification performance, but it is usually hard to explain their classification results. As a counter-example, an interpretable feedforward neural network (IFFNN) is proposed to achieve both high classification performance and interpretability for malware detection. If the IFFNN can perform well in a more flexible and general form for other classification tasks while providing meaningful explanations, it may be of great interest to the applied machine learning community. In this paper, we propose a way to generalize the interpretable feedforward neural network to multi-class classification scenarios and any type of feedforward neural networks, and evaluate its classification performance and interpretability on interpretable datasets. We conclude by finding that the generalized IFFNNs achieve comparable classification performance to their normal feedforward neural network counterparts and provide meaningful explanations. Thus, this kind of neural network architecture has great practical use.

2021-11-03

ArXiv (preprint)

The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution

Malik H. Altakrori

Jackie Cheung

2021-11-01

Findings of the Association for Computational Linguistics: EMNLP 2021 (publié)