Benjamin Fung

Associate Academic Member

Associate Professor, McGill University, School of Information Studies

McGill University University

Research Topics

AI for Software Engineering

Applied Machine Learning

Cybersecurity

Data Mining

Deep Learning

Information Retrieval

Misinformation

Privacy

Representation Learning

Website

Google Scholar

Biography

Benjamin Fung is a Canada Research Chair in Data Mining for Cybersecurity, as well as a full professor at the School of Information Studies and associate member of the School of Computer Science, McGill University.

Fung serves as an associate editor of IEEE Transactions of Knowledge and Data Engineering and Sustainable Cities and Society. He received his PhD in computing science from Simon Fraser University in 2007.

Dr. Fung has over 150 refereed publications to his credit and and more than 14,000 citations (h-index 57) spanning the fields of data mining, machine learning, privacy, cybersecurity and building engineering. His findings in the fields of data mining for crime investigations and authorship analysis have been reported by the media worldwide.

Publications

A Comprehensive Analysis of Explainable AI for Malware Hunting

Mohd Saqib

Samaneh Mahdavifar

Benjamin Fung

Philippe Charland

In the past decade, the number of malware variants has increased rapidly. Many researchers have proposed to detect malware using intelligent… (see more) techniques, such as Machine Learning (ML) and Deep Learning (DL), which have high accuracy and precision. These methods, however, suffer from being opaque in the decision-making process. Therefore, we need Artificial Intelligence (AI)-based models to be explainable, interpretable, and transparent to be reliable and trustworthy. In this survey, we reviewed articles related to Explainable AI (XAI) and their application to the significant scope of malware detection. The article encompasses a comprehensive examination of various XAI algorithms employed in malware analysis. Moreover, we have addressed the characteristics, challenges, and requirements in malware analysis that cannot be accommodated by standard XAI methods. We discussed that even though Explainable Malware Detection (EMD) models provide explainability, they make an AI-based model more vulnerable to adversarial attacks. We also propose a framework that assigns a level of explainability to each XAI malware analysis model, based on the security features involved in each method. In summary, the proposed project focuses on combining XAI and malware analysis to apply XAI models for scrutinizing the opaque nature of AI systems and their applications to malware analysis.

2024-07-11

ACM Computing Surveys (published)

doi.org

Survey on Explainable AI: Techniques, challenges and open issues

Adel Abusitta

Miles Q. Li

Benjamin Fung

2024-07-01

Expert systems with applications (published)

doi.org

Survey on Explainable AI: Techniques, challenges and open issues

Adel Abusitta

Miles Q. Li

Benjamin Fung

2024-07-01

Expert systems with applications (published)

doi.org

Survey on Explainable AI: Techniques, challenges and open issues

Adel Abusitta

Miles Q. Li

Benjamin Fung

2024-07-01

Expert systems with applications (published)

doi.org

Survey on Explainable AI: Techniques, challenges and open issues

Adel Abusitta

Miles Q. Li

Benjamin Fung

2024-07-01

Expert systems with applications (published)

doi.org

Tracing the Ransomware Bloodline: Investigation and Detection of Drifting Virlock Variants

Salwa Razaulla

Claude Fachkha

Amjad Gawanmeh

Christine Markarian

Benjamin Fung

Chadi Assi

Malware, especially ransomware, has dramatically increased in volume and sophistication in recent years. The growing complexity and destruct… (see more)ive potential of ransomware demand effective countermeasures. Despite tremendous efforts by the security community to document these threats, reliance on manual analysis makes it challenging to discern unique malware variants from polymorphic variants. Moreover, the easy accessibility of source code of prominent ransomware families in public domains has led to the rise of numerous variants, complicating manual detection and hindering the identification of phylogenetic relationships. This paper introduces a novel approach that narrows the focus to analyze one such prominent ransomware family, Virlock. Using binary code similarity, we systematically reconstruct the lineage of Virlock, tracing its relationships, evolution, and variants. Employing this technique on a dataset of over 1000 Virlock samples submitted to VirusTotal and VirusShare, our analysis unveils intricate relationships within the Virlock ransomware family, offering valuable insights into the tangled relationships of this ransomware.

2024-06-14

International Conference on Computational Collective Intelligence (published)

doi.org

Better entity matching with transformers through ensembles

Jwen Fai Low

Benjamin Fung

Pulei Xiong

2024-06-01

Knowledge-Based Systems (published)

doi.org

ERS0: Enhancing Military Cybersecurity with AI-Driven SBOM for Firmware Vulnerability Detection and Asset Management

Max Beninger

Philippe Charland

Steven H. H. Ding

Benjamin Fung

Firmware vulnerability detection and asset management through a software bill of material (SBOM) approach is integral to defensive military … (see more)operations. SBOMs provide a comprehensive list of software components, enabling military organizations to identify vulnerabilities within critical systems, including those controlling various functions in military platforms, as well as in operational technologies and Internet of Things devices. This proactive approach is essential for supply chain security, ensuring that software components are sourced from trusted suppliers and have not been tampered with during production, distribution, or through updates. It is a key element of defense strategies, allowing for rapid assessment, response, and mitigation of vulnerabilities, ultimately safeguarding military capabilities and information from cyber threats. In this paper, we propose ERS0, an SBOM system, driven by artificial intelligence (AI), for detecting firmware vulnerabilities and managing firmware assets. We harness the power of pre-trained large-scale language models to effectively address a wide array of string patterns, extending our coverage to thousands of third-party library patterns. Furthermore, we employ AI-powered code clone search models, enabling a more granular and precise search for vulnerabilities at the binary level, reducing our dependence on string analysis only. Additionally, our AI models extract high-level behavioral functionalities in firmware, such as communication and encryption, allowing us to quantitatively define the behavioral scope of firmware. In preliminary comparative assessments against open-source alternatives, our solution has demonstrated better SBOM coverage, accuracy in vulnerability identification, and a wider array of features.

2024-05-28

2024 16th International Conference on Cyber Conflict: Over the Horizon (CyCon) (published)

doi.org

GAGE: Genetic Algorithm-Based Graph Explainer for Malware Analysis

Mohd Saqib

Benjamin Fung

Philippe Charland

Andrew Walenstein

Malware analysts often prefer reverse engineering using Call Graphs, Control Flow Graphs (CFGs), and Data Flow Graphs (DFGs), which involves… (see more) the utilization of black-box Deep Learning (DL) models. The proposed research introduces a structured pipeline for reverse engineering-based analysis, offering promising results compared to state-of-the-art methods and providing high-level interpretability for malicious code blocks in subgraphs. We propose the Canonical Executable Graph (CEG) as a new representation of Portable Executable (PE) files, uniquely incorporating syntactical and semantic information into its node embeddings. At the same time, edge features capture structural aspects of PE files. This is the first work to present a PE file representation encompassing syntactical, semantic, and structural characteristics, whereas previous efforts typically focused solely on syntactic or structural properties. Furthermore, recognizing the limitations of existing graph explanation methods within Explainable Artificial Intelligence (XAI) for malware analysis, primarily due to the specificity of malicious files, we introduce Genetic Algorithm-based Graph Explainer (GAGE). GAGE operates on the CEG, striving to identify a precise subgraph relevant to predicted malware families. Through experiments and comparisons, our proposed pipeline exhibits substantial improvements in model robustness scores and discriminative power compared to the previous benchmarks. Furthermore, we have successfully used GAGE in practical applications on real-world data, producing meaningful insights and interpretability. This research offers a robust solution to enhance cybersecurity by delivering a transparent and accurate understanding of malware behaviour. Moreover, the proposed algorithm is specialized in handling graph-based data, effectively dissecting complex content and isolating influential nodes.

2024-05-13

IEEE International Conference on Data Engineering (published)

doi.org

Fairness-aware data-driven-based model predictive controller: A study on thermal energy storage in a residential building

Ying Sun

Fariborz Haghighat

Benjamin Fung

2024-05-01

Journal of Energy Storage (published)

doi.org

Fairness-aware data-driven-based model predictive controller: A study on thermal energy storage in a residential building

Ying Sun