Portrait de Benjamin Fung

Benjamin Fung

Membre académique associé
Professeur agrégé, McGill University, École des sciences de l'information
McGill University University
Sujets de recherche
Apprentissage automatique appliqué
Apprentissage de représentations
Apprentissage profond
Cybersécurité
Désinformation
Exploration des données
IA pour l'ingénierie logicielle
Recherche d'information
Vie privée

Biographie

Benjamin Fung est titulaire d'une chaire de recherche du Canada en exploration de données pour la cybersécurité, professeur agrégé à l’École des sciences de l’information et membre agrégé de l’École d’informatique de l'Université McGill, rédacteur adjoint de IEEE Transactions of Knowledge and Data Engineering et rédacteur adjoint de Elsevier Sustainable Cities and Society (SCS). Il a obtenu un doctorat en informatique de l'Université Simon Fraser en 2007. Il a à son actif plus de 150 publications revues par un comité de lecture, et plus de 14 000 citations (h-index 57) qui couvrent les domaines de l'exploration de données, de l'apprentissage automatique, de la protection de la vie privée, de la cybersécurité et du génie du bâtiment. Ses travaux d'exploration de données dans les enquêtes criminelles et l'analyse de la paternité d’une œuvre ont été recensés par les médias du monde entier.

Publications

Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity
Charles E. Gagnon
Steven H. H. Ding
Philippe Charland
Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identify… (voir plus)ing semantically similar code in different contexts. Modern methods have progressed from manually engineered features to vector representations. Hand-crafted statistics (e.g., operation ratios) are interpretable, but shallow and fail to generalize. Embedding-based methods overcome this by learning robust cross-setting representations, but these representations are opaque vectors that prevent rapid verification. They also face a scalability-accuracy trade-off, since high-dimensional nearest-neighbor search requires approximations that reduce precision. Current approaches thus force a compromise between interpretability, generalizability, and scalability. We bridge these gaps using a language model-based agent to conduct structured reasoning analysis of assembly code and generate features such as input/output types, side effects, notable constants, and algorithmic intent. Unlike hand-crafted features, they are richer and adaptive. Unlike embeddings, they are human-readable, maintainable, and directly searchable with inverted or relational indexes. Without any matching training, our method respectively achieves 42% and 62% for recall@1 in cross-architecture and cross-optimization tasks, comparable to embedding methods with training (39% and 34%). Combined with embeddings, it significantly outperforms the state-of-the-art, demonstrating that accuracy, scalability, and interpretability can coexist.
Responsible AI Day
Ebrahim Bagheri
Faezeh Ensan
Calvin Hillis
Robin Cohen
Sébastien Gambs
Responsible AI Day
Ebrahim Bagheri
Faezeh Ensan
Calvin Hillis
Robin Cohen
Sébastien Gambs
Diminished social memory and hippocampal correlates of social interactions in chronic social defeat stress susceptibility
Amanda Larosa
Tian Rui Zhang
Alice S. Wong
Cyrus Y.H. Fung
Y. H. Fung Cyrus
Xiong Ling Yun (Jenny) Long
Prabhjeet Singh
Tak Pan Wong
CSGraph2Vec: Distributed Graph-Based Representation Learning for Assembly Functions
Wael J. Alhashemi
Adel Abusitta
Claude Fachkha
AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection
Arezo Bodaghi
Ketra A. Schmitt
A Literature Review on Detecting, Verifying, and Mitigating Online Misinformation
Arezo Bodaghi
Ketra A. Schmitt
Pierre Watine
Social media use has transformed communication and made social interaction more accessible. Public microblogs allow people to share and acce… (voir plus)ss news through existing and social-media-created social connections and access to public news sources. These benefits also create opportunities for the spread of false information. False information online can mislead people, decrease the benefits derived from social media, and reduce trust in genuine news. We divide false information into two categories: unintentional false information, also known as misinformation; and intentionally false information, also known as disinformation and fake news. Given the increasing prevalence of misinformation, it is imperative to address its dissemination on social media platforms. This survey focuses on six key aspects related to misinformation: 1) clarify the definition of misinformation to differentiate it from intentional forms of false information; 2) categorize proposed approaches to manage misinformation into three types: detection, verification, and mitigation; 3) review the platforms and languages for which these techniques have been proposed and tested; 4) describe the specific features that are considered in each category; 5) compare public datasets created to address misinformation and categorize into prelabeled content-only datasets and those including users and their connections; and 6) survey fact-checking websites that can be used to verify the accuracy of information. This survey offers a comprehensive and unprecedented review of misinformation, integrating various methodological approaches, datasets, and content-, user-, and network-based approaches, which will undoubtedly benefit future research in this field.
A Comprehensive Analysis of Explainable AI for Malware Hunting
Mohd Saqib
Samaneh Mahdavifar
Philippe Charland
In the past decade, the number of malware variants has increased rapidly. Many researchers have proposed to detect malware using intelligent… (voir plus) techniques, such as Machine Learning (ML) and Deep Learning (DL), which have high accuracy and precision. These methods, however, suffer from being opaque in the decision-making process. Therefore, we need Artificial Intelligence (AI)-based models to be explainable, interpretable, and transparent to be reliable and trustworthy. In this survey, we reviewed articles related to Explainable AI (XAI) and their application to the significant scope of malware detection. The article encompasses a comprehensive examination of various XAI algorithms employed in malware analysis. Moreover, we have addressed the characteristics, challenges, and requirements in malware analysis that cannot be accommodated by standard XAI methods. We discussed that even though Explainable Malware Detection (EMD) models provide explainability, they make an AI-based model more vulnerable to adversarial attacks. We also propose a framework that assigns a level of explainability to each XAI malware analysis model, based on the security features involved in each method. In summary, the proposed project focuses on combining XAI and malware analysis to apply XAI models for scrutinizing the opaque nature of AI systems and their applications to malware analysis.
Survey on Explainable AI: Techniques, challenges and open issues
Adel Abusitta
Miles Q. Li
Survey on Explainable AI: Techniques, challenges and open issues
Adel Abusitta
Miles Q. Li
Survey on Explainable AI: Techniques, challenges and open issues
Adel Abusitta
Miles Q. Li
Survey on Explainable AI: Techniques, challenges and open issues
Adel Abusitta
Miles Q. Li