Portrait de Benjamin Fung

Benjamin Fung

Membre académique associé
Professeur agrégé, McGill University, École des sciences de l'information
Sujets de recherche
Exploration des données

Biographie

Benjamin Fung est titulaire d'une chaire de recherche du Canada en exploration de données pour la cybersécurité, professeur agrégé à l’École des sciences de l’information et membre agrégé de l’École d’informatique de l'Université McGill, rédacteur adjoint de IEEE Transactions of Knowledge and Data Engineering et rédacteur adjoint de Elsevier Sustainable Cities and Society (SCS). Il a obtenu un doctorat en informatique de l'Université Simon Fraser en 2007. Il a à son actif plus de 150 publications revues par un comité de lecture, et plus de 14 000 citations (h-index 57) qui couvrent les domaines de l'exploration de données, de l'apprentissage automatique, de la protection de la vie privée, de la cybersécurité et du génie du bâtiment. Ses travaux d'exploration de données dans les enquêtes criminelles et l'analyse de la paternité d’une œuvre ont été recensés par les médias du monde entier.

Publications

Diminished social memory and hippocampal correlates of social interactions in chronic social defeat stress susceptibility
Amanda Larosa
Tian Rui Zhang
Alice S. Wong
Y. H. Fung Cyrus
Xiong Ling Yun (Jenny) Long
Tak Pan Wong
A Comprehensive Analysis of Explainable AI for Malware Hunting
Mohd Saqib
Samaneh Mahdavifar
Philippe Charland
In the past decade, the number of malware variants has increased rapidly. Many researchers have proposed to detect malware using intelligent… (voir plus) techniques, such as Machine Learning (ML) and Deep Learning (DL), which have high accuracy and precision. These methods, however, suffer from being opaque in the decision-making process. Therefore, we need Artificial Intelligence (AI)-based models to be explainable, interpretable, and transparent to be reliable and trustworthy. In this survey, we reviewed articles related to Explainable AI (XAI) and their application to the significant scope of malware detection. The article encompasses a comprehensive examination of various XAI algorithms employed in malware analysis. Moreover, we have addressed the characteristics, challenges, and requirements in malware analysis that cannot be accommodated by standard XAI methods. We discussed that even though Explainable Malware Detection (EMD) models provide explainability, they make an AI-based model more vulnerable to adversarial attacks. We also propose a framework that assigns a level of explainability to each XAI malware analysis model, based on the security features involved in each method. In summary, the proposed project focuses on combining XAI and malware analysis to apply XAI models for scrutinizing the opaque nature of AI systems and their applications to malware analysis.
Better entity matching with transformers through ensembles
Jwen Fai Low
Pulei Xiong
ERS0: Enhancing Military Cybersecurity with AI-Driven SBOM for Firmware Vulnerability Detection and Asset Management
Max Beninger
Philippe Charland
Steven H. H. Ding
Firmware vulnerability detection and asset management through a software bill of material (SBOM) approach is integral to defensive military … (voir plus)operations. SBOMs provide a comprehensive list of software components, enabling military organizations to identify vulnerabilities within critical systems, including those controlling various functions in military platforms, as well as in operational technologies and Internet of Things devices. This proactive approach is essential for supply chain security, ensuring that software components are sourced from trusted suppliers and have not been tampered with during production, distribution, or through updates. It is a key element of defense strategies, allowing for rapid assessment, response, and mitigation of vulnerabilities, ultimately safeguarding military capabilities and information from cyber threats. In this paper, we propose ERS0, an SBOM system, driven by artificial intelligence (AI), for detecting firmware vulnerabilities and managing firmware assets. We harness the power of pre-trained large-scale language models to effectively address a wide array of string patterns, extending our coverage to thousands of third-party library patterns. Furthermore, we employ AI-powered code clone search models, enabling a more granular and precise search for vulnerabilities at the binary level, reducing our dependence on string analysis only. Additionally, our AI models extract high-level behavioral functionalities in firmware, such as communication and encryption, allowing us to quantitatively define the behavioral scope of firmware. In preliminary comparative assessments against open-source alternatives, our solution has demonstrated better SBOM coverage, accuracy in vulnerability identification, and a wider array of features.
GAGE: Genetic Algorithm-Based Graph Explainer for Malware Analysis
Mohd Saqib
Philippe Charland
Andrew Walenstein
Malware analysts often prefer reverse engineering using Call Graphs, Control Flow Graphs (CFGs), and Data Flow Graphs (DFGs), which involves… (voir plus) the utilization of black-box Deep Learning (DL) models. The proposed research introduces a structured pipeline for reverse engineering-based analysis, offering promising results compared to state-of-the-art methods and providing high-level interpretability for malicious code blocks in subgraphs. We propose the Canonical Executable Graph (CEG) as a new representation of Portable Executable (PE) files, uniquely incorporating syntactical and semantic information into its node embeddings. At the same time, edge features capture structural aspects of PE files. This is the first work to present a PE file representation encompassing syntactical, semantic, and structural characteristics, whereas previous efforts typically focused solely on syntactic or structural properties. Furthermore, recognizing the limitations of existing graph explanation methods within Explainable Artificial Intelligence (XAI) for malware analysis, primarily due to the specificity of malicious files, we introduce Genetic Algorithm-based Graph Explainer (GAGE). GAGE operates on the CEG, striving to identify a precise subgraph relevant to predicted malware families. Through experiments and comparisons, our proposed pipeline exhibits substantial improvements in model robustness scores and discriminative power compared to the previous benchmarks. Furthermore, we have successfully used GAGE in practical applications on real-world data, producing meaningful insights and interpretability. This research offers a robust solution to enhance cybersecurity by delivering a transparent and accurate understanding of malware behaviour. Moreover, the proposed algorithm is specialized in handling graph-based data, effectively dissecting complex content and isolating influential nodes.
Fairness-aware data-driven-based model predictive controller: A study on thermal energy storage in a residential building
Ying Sun
Fariborz Haghighat
Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection
Litao Li
Steven H. H. Ding
Andrew Walenstein
Philippe Charland
Carthago Delenda Est: Co-opetitive Indirect Information Diffusion Model for Influence Operations on Online Social Media
Jwen Fai Low
Farkhund Iqbal
Claude Fachkha
AsmDocGen: Generating Functional Natural Language Descriptions for Assembly Code
Jesia Yuki
Mohammadhossein Amouei
Philippe Charland
Andrew Walenstein
BETAC: Bidirectional Encoder Transformer for Assembly Code Function Name Recovery
Guillaume Breyton
Mohd Saqib
Philippe Charland
Recovering function names from stripped binaries is a crucial and time-consuming task for software reverse engineering’ particularly in en… (voir plus)hancing network reliability, resilience, and security. This paper tackles the challenge of recovering function names in stripped binaries, a fundamental step in reverse engineering. The absence of syntactic information and the possibility of different code producing identical behavior complicate this task. To overcome these challenges, we introduce a novel model, the Bidirectional Encoder Transformer for Assembly Code (BETAC), leveraging a transformer-based architecture known for effectively processing sequential data. BETAC utilizes self-attention mechanisms and feed-forward networks to discern complex relationships within assembly code for precise function name prediction. We evaluated BETAC against various existing encoder and decoder models in diverse binary datasets, including benign and malicious codes in multiple formats. Our model demonstrated superior performance over previous techniques in certain metrics and showed resilience against code obfuscation.
Multidomain Object Detection Framework Using Feature Domain Knowledge Distillation.
Da-Wei Jaw
Shih-Chia Huang
Zhihui Lu
Sy-Yen Kuo
Object detection techniques have been widely studied, utilized in various works, and have exhibited robust performance on images with suffic… (voir plus)ient luminance. However, these approaches typically struggle to extract valuable features from low-luminance images, which often exhibit blurriness and dim appearence, leading to detection failures. To overcome this issue, we introduce an innovative unsupervised feature domain knowledge distillation (KD) framework. The proposed framework enhances the generalization capability of neural networks across both low-and high-luminance domains without incurring additional computational costs during testing. This improvement is made possible through the integration of generative adversarial networks and our proposed unsupervised KD process. Furthermore, we introduce a region-based multiscale discriminator designed to discern feature domain discrepancies at the object level rather than from the global context. This bolsters the joint learning process of object detection and feature domain distillation tasks. Both qualitative and quantitative assessments shown that the proposed method, empowered by the region-based multiscale discriminator and the unsupervised feature domain distillation process, can effectively extract beneficial features from low-luminance images, outperforming other state-of-the-art approaches in both low-and sufficient-luminance domains.
Survey on Explainable AI: Techniques, challenges and open issues
Adel Abusitta
Miles Q. Li