Benjamin Fung

Membre académique associé

Professeur agrégé, McGill University, École des sciences de l'information

McGill University University

Sujets de recherche

Apprentissage automatique appliqué

Apprentissage de représentations

Apprentissage profond

Cybersécurité

Désinformation

Exploration des données

IA pour l'ingénierie logicielle

Recherche d'information

Vie privée

Site web

Google Scholar

Biographie

Benjamin Fung est titulaire d'une chaire de recherche du Canada en exploration de données pour la cybersécurité, professeur agrégé à l’École des sciences de l’information et membre agrégé de l’École d’informatique de l'Université McGill, rédacteur adjoint de IEEE Transactions of Knowledge and Data Engineering et rédacteur adjoint de Elsevier Sustainable Cities and Society (SCS). Il a obtenu un doctorat en informatique de l'Université Simon Fraser en 2007. Il a à son actif plus de 150 publications revues par un comité de lecture, et plus de 14 000 citations (h-index 57) qui couvrent les domaines de l'exploration de données, de l'apprentissage automatique, de la protection de la vie privée, de la cybersécurité et du génie du bâtiment. Ses travaux d'exploration de données dans les enquêtes criminelles et l'analyse de la paternité d’une œuvre ont été recensés par les médias du monde entier.

Publications

A Comprehensive Analysis of Explainable AI for Malware Hunting

Mohd Saqib

Samaneh Mahdavifar

Benjamin Fung

Philippe Charland

In the past decade, the number of malware variants has increased rapidly. Many researchers have proposed to detect malware using intelligent… (voir plus) techniques, such as Machine Learning (ML) and Deep Learning (DL), which have high accuracy and precision. These methods, however, suffer from being opaque in the decision-making process. Therefore, we need Artificial Intelligence (AI)-based models to be explainable, interpretable, and transparent to be reliable and trustworthy. In this survey, we reviewed articles related to Explainable AI (XAI) and their application to the significant scope of malware detection. The article encompasses a comprehensive examination of various XAI algorithms employed in malware analysis. Moreover, we have addressed the characteristics, challenges, and requirements in malware analysis that cannot be accommodated by standard XAI methods. We discussed that even though Explainable Malware Detection (EMD) models provide explainability, they make an AI-based model more vulnerable to adversarial attacks. We also propose a framework that assigns a level of explainability to each XAI malware analysis model, based on the security features involved in each method. In summary, the proposed project focuses on combining XAI and malware analysis to apply XAI models for scrutinizing the opaque nature of AI systems and their applications to malware analysis.

2024-07-11

ACM Computing Surveys (publié)

doi.org

Survey on Explainable AI: Techniques, challenges and open issues

Adel Abusitta

Miles Q. Li

Benjamin Fung

2024-07-01

Expert systems with applications (publié)

doi.org

Survey on Explainable AI: Techniques, challenges and open issues

Adel Abusitta

Miles Q. Li

Benjamin Fung

2024-07-01

Expert systems with applications (publié)

doi.org

Survey on Explainable AI: Techniques, challenges and open issues

Adel Abusitta

Miles Q. Li

Benjamin Fung

2024-07-01

Expert systems with applications (publié)

doi.org

Survey on Explainable AI: Techniques, challenges and open issues

Adel Abusitta

Miles Q. Li

Benjamin Fung

2024-07-01

Expert systems with applications (publié)

doi.org

Tracing the Ransomware Bloodline: Investigation and Detection of Drifting Virlock Variants

Salwa Razaulla

Claude Fachkha

Amjad Gawanmeh

Christine Markarian

Benjamin Fung

Chadi Assi

Malware, especially ransomware, has dramatically increased in volume and sophistication in recent years. The growing complexity and destruct… (voir plus)ive potential of ransomware demand effective countermeasures. Despite tremendous efforts by the security community to document these threats, reliance on manual analysis makes it challenging to discern unique malware variants from polymorphic variants. Moreover, the easy accessibility of source code of prominent ransomware families in public domains has led to the rise of numerous variants, complicating manual detection and hindering the identification of phylogenetic relationships. This paper introduces a novel approach that narrows the focus to analyze one such prominent ransomware family, Virlock. Using binary code similarity, we systematically reconstruct the lineage of Virlock, tracing its relationships, evolution, and variants. Employing this technique on a dataset of over 1000 Virlock samples submitted to VirusTotal and VirusShare, our analysis unveils intricate relationships within the Virlock ransomware family, offering valuable insights into the tangled relationships of this ransomware.

2024-06-14

International Conference on Computational Collective Intelligence (publié)

doi.org

Better entity matching with transformers through ensembles

Jwen Fai Low

Benjamin Fung

Pulei Xiong

2024-06-01

Knowledge-Based Systems (publié)

doi.org

ERS0: Enhancing Military Cybersecurity with AI-Driven SBOM for Firmware Vulnerability Detection and Asset Management

Max Beninger

Philippe Charland

Steven H. H. Ding

Benjamin Fung

Firmware vulnerability detection and asset management through a software bill of material (SBOM) approach is integral to defensive military … (voir plus)operations. SBOMs provide a comprehensive list of software components, enabling military organizations to identify vulnerabilities within critical systems, including those controlling various functions in military platforms, as well as in operational technologies and Internet of Things devices. This proactive approach is essential for supply chain security, ensuring that software components are sourced from trusted suppliers and have not been tampered with during production, distribution, or through updates. It is a key element of defense strategies, allowing for rapid assessment, response, and mitigation of vulnerabilities, ultimately safeguarding military capabilities and information from cyber threats. In this paper, we propose ERS0, an SBOM system, driven by artificial intelligence (AI), for detecting firmware vulnerabilities and managing firmware assets. We harness the power of pre-trained large-scale language models to effectively address a wide array of string patterns, extending our coverage to thousands of third-party library patterns. Furthermore, we employ AI-powered code clone search models, enabling a more granular and precise search for vulnerabilities at the binary level, reducing our dependence on string analysis only. Additionally, our AI models extract high-level behavioral functionalities in firmware, such as communication and encryption, allowing us to quantitatively define the behavioral scope of firmware. In preliminary comparative assessments against open-source alternatives, our solution has demonstrated better SBOM coverage, accuracy in vulnerability identification, and a wider array of features.

2024-05-28

2024 16th International Conference on Cyber Conflict: Over the Horizon (CyCon) (publié)

doi.org

GAGE: Genetic Algorithm-Based Graph Explainer for Malware Analysis

Mohd Saqib

Benjamin Fung

Philippe Charland

Andrew Walenstein

Malware analysts often prefer reverse engineering using Call Graphs, Control Flow Graphs (CFGs), and Data Flow Graphs (DFGs), which involves… (voir plus) the utilization of black-box Deep Learning (DL) models. The proposed research introduces a structured pipeline for reverse engineering-based analysis, offering promising results compared to state-of-the-art methods and providing high-level interpretability for malicious code blocks in subgraphs. We propose the Canonical Executable Graph (CEG) as a new representation of Portable Executable (PE) files, uniquely incorporating syntactical and semantic information into its node embeddings. At the same time, edge features capture structural aspects of PE files. This is the first work to present a PE file representation encompassing syntactical, semantic, and structural characteristics, whereas previous efforts typically focused solely on syntactic or structural properties. Furthermore, recognizing the limitations of existing graph explanation methods within Explainable Artificial Intelligence (XAI) for malware analysis, primarily due to the specificity of malicious files, we introduce Genetic Algorithm-based Graph Explainer (GAGE). GAGE operates on the CEG, striving to identify a precise subgraph relevant to predicted malware families. Through experiments and comparisons, our proposed pipeline exhibits substantial improvements in model robustness scores and discriminative power compared to the previous benchmarks. Furthermore, we have successfully used GAGE in practical applications on real-world data, producing meaningful insights and interpretability. This research offers a robust solution to enhance cybersecurity by delivering a transparent and accurate understanding of malware behaviour. Moreover, the proposed algorithm is specialized in handling graph-based data, effectively dissecting complex content and isolating influential nodes.

2024-05-13

IEEE International Conference on Data Engineering (publié)

doi.org

Fairness-aware data-driven-based model predictive controller: A study on thermal energy storage in a residential building

Ying Sun

Fariborz Haghighat

Benjamin Fung

2024-05-01

Journal of Energy Storage (publié)

doi.org

Fairness-aware data-driven-based model predictive controller: A study on thermal energy storage in a residential building

Ying Sun