Portrait of Benjamin Fung

Benjamin Fung

Associate Academic Member
Associate Professor, McGill University, School of Information Studies

Biography

Benjamin Fung is a Canada Research Chair in Data Mining for Cybersecurity, as well as a full professor at the School of Information Studies and associate member of the School of Computer Science, McGill University.

Fung serves as an associate editor of IEEE Transactions of Knowledge and Data Engineering and Sustainable Cities and Society. He received his PhD in computing science from Simon Fraser University in 2007.

Dr. Fung has over 150 refereed publications to his credit and and more than 14,000 citations (h-index 57) spanning the fields of data mining, machine learning, privacy, cybersecurity and building engineering. His findings in the fields of data mining for crime investigations and authorship analysis have been reported by the media worldwide.

Publications

A Novel Deep Multi-head Attentive Vulnerable Line Detector
Miles Q. Li
Ashita Diwan
Of Stances, Themes, and Anomalies in COVID-19 Mask-Wearing Tweets
Jwen Fai Low
Farkhund Iqbal
COVID-19 is an opportunity to study public acceptance of a “new” healthcare intervention, universal masking, which unlike vaccination, i… (see more)s mostly alien to the Anglosphere public despite being practiced in ages past. Using a collection of over two million tweets, we studied the ways in which proponents and opponents of masking vied for influence as well as the themes driving the discourse. Pro-mask tweets encouraging others to mask up dominated Twitter early in the pandemic though its continued dominance has been eroded by anti-mask tweets criticizing others for their masking behavior. Engagement, represented by the counts of likes, retweets, and replies, and controversiality and disagreeableness, represented by ratios of the aforementioned counts, favored pro-mask tweets initially but with anti-mask tweets slowly gaining ground. Additional analysis raised the possibility of the platform owners suppressing certain parts of the mask-wearing discussion.
The Age of Ransomware: A Survey on the Evolution, Taxonomy, and Research Directions
Salwa Razaulla
Claude Fachkha
Christine Markarian
Amjad Gawanmeh
Wathiq Mansoor
Chadi Assi
The proliferation of ransomware has become a significant threat to cybersecurity in recent years, causing significant financial, reputationa… (see more)l, and operational damage to individuals and organizations. This paper aims to provide a comprehensive overview of the evolution of ransomware, its taxonomy, and its state-of-the-art research contributions. We begin by tracing the origins of ransomware and its evolution over time, highlighting the key milestones and major trends. Next, we propose a taxonomy of ransomware that categorizes different types of ransomware based on their characteristics and behavior. Subsequently, we review the existing research over several years in regard to detection, prevention, mitigation, and prediction techniques. Our extensive analysis, based on more than 150 references, has revealed that significant research, specifically 72.8%, has focused on detecting ransomware. However, a lack of emphasis has been placed on predicting ransomware. Additionally, of the studies focused on ransomware detection, a significant portion, 70%, have utilized Machine Learning methods. This study uncovers a range of shortcomings in research pertaining to real-time protection and identifying zero-day ransomware, and two issues specific to Machine Learning models. Adversarial machine learning exploitation and concept drift have been identified as under-researched areas in the field. This survey is a constructive roadmap for researchers interested in ransomware research matters.
In-Processing Fairness Improvement Methods for Regression Data-Driven Building Models: Achieving Uniform Energy Prediction
Ying Sun
Fariborz Haghighat
A Multifaceted Framework to Evaluate Evasion, Content Preservation, and Misattribution in Authorship Obfuscation Techniques
Malik H. Altakrori
Thomas Scialom
VDGraph2Vec: Vulnerability Detection in Assembly Code using Message Passing Neural Networks
Ashita Diwan
Miles Q. Li
Software vulnerability detection is one of the most challenging tasks faced by reverse engineers. Recently, vulnerability detection has rece… (see more)ived a lot of attention due to a drastic increase in the volume and complexity of software. Reverse engineering is a time-consuming and labor-intensive process for detecting malware and software vulnerabilities. However, with the advent of deep learning and machine learning, it has become possible for researchers to automate the process of identifying potential security breaches in software by developing more intelligent technologies. In this research, we propose VDGraph2Vec, an automated deep learning method to generate representations of assembly code for the task of vulnerability detection. Previous approaches failed to attend to topological characteristics of assembly code while discovering the weakness in the software. VDGraph2Vec embeds the control flow and semantic information of assembly code effectively using the expressive capabilities of message passing neural networks and the RoBERTa model. Our model is able to learn the important features that help distinguish between vulnerable and non-vulnerable software. We carry out our experimental analysis for performance benchmark on three of the most common weaknesses and demonstrate that our model can identify vulnerabilities with high accuracy and outperforms the current state-of-the-art binary vulnerability detection models.
Towards Adaptive Cybersecurity for Green IoT
Talal Halabi
Martine Bellaiche
The Internet of Things (IoT) paradigm has led to an explosion in the number of IoT devices and an exponential rise in carbon footprint incur… (see more)red by overburdened IoT networks and pervasive cloud/edge communications. Hence, there is a growing interest in industry and academia to enable the efficient use of computing infrastructures by optimizing the management of data center and IoT resources (hardware, software, network, and data) and reducing operational costs to slash greenhouse gas emissions and create healthy environments. Cybersecurity has also been considered in such efforts as a contributor to these environmental issues. Nonetheless, most green security approaches focus on designing low-overhead encryption schemes and do not emphasize energy-efficient security from architectural and deployment viewpoints. This paper sheds light on the emerging paradigm of adaptive cybersecurity as one of the research directions to support sustainable computing in green IoT. It presents three potential research directions and their associated methods for designing and deploying adaptive security in green computing and resource-constrained IoT environments to save on energy consumption. Such efforts will transform the development of data-driven IoT security solutions to be greener and more environment-friendly.
The generalizability of pre-processing techniques on the accuracy and fairness of data-driven building models: a case study
Ying Sun
Fariborz Haghighat
H4rm0ny: A Competitive Zero-Sum Two-Player Markov Game for Multi-Agent Learning on Evasive Malware Generation and Detection
Christopher Molloy
Steven H. H. Ding
Philippe Charland
To combat the increasingly versatile and mutable modern malware, Machine Learning (ML) is now a popular and effective complement to the exis… (see more)ting signature-based techniques for malware triage and identification. However, ML is also a readily available tool for adversaries. Recent studies have shown that malware can be modified by deep Reinforcement Learning (RL) techniques to bypass AI-based and signature-based anti-virus systems without altering their original malicious functionalities. These studies only focus on generating evasive samples and assume a static detection system as the enemy.Malware detection and evasion essentially form a two-party cat-and-mouse game. Simulating the real-life scenarios, in this paper we present the first two-player competitive game for evasive malware detection and generation, following the zero-sum Multi-Agent Reinforcement Learning (MARL) paradigm. Our experiments on recent malware show that the produced malware detection agent is more robust against adversarial attacks. Also, the produced malware modification agent is able to generate more evasive samples fooling both AI-based and other anti-malware techniques.
On the Effectiveness of Interpretable Feedforward Neural Network
Miles Q. Li
Adel Abusitta
Deep learning models have achieved state-of-the-art performance in many classification tasks. However, most of them cannot provide an explan… (see more)ation for their classification results. Machine learning models that are interpretable are usually linear or piecewise linear and yield inferior performance. Non-linear models achieve much better classification performance, but it is usually hard to explain their classification results. As a counter-example, an interpretable feedforward neural network (IFFNN) is proposed to achieve both high classification performance and interpretability for malware detection. If the IFFNN can perform well in a more flexible and general form for other classification tasks while providing meaningful explanations, it may be of great interest to the applied machine learning community. In this paper, we propose a way to generalize the interpretable feedforward neural network to multi-class classification scenarios and any type of feedforward neural networks, and evaluate its classification performance and interpretability on interpretable datasets. We conclude by finding that the generalized IFFNNs achieve comparable classification performance to their normal feedforward neural network counterparts and provide meaningful explanations. Thus, this kind of neural network architecture has great practical use.
Incentivized Security-Aware Computation Offloading for Large-Scale Internet of Things Applications
Talal Halabi
Adel Abusitta
Glaucio H.S. Carvalho
JARV1S: Phenotype Clone Search for Rapid Zero-Day Malware Triage and Functional Decomposition for Cyber Threat Intelligence
Christopher Molloy
Philippe Charland
Steven H. H. Ding
Cyber threat intelligence (CTI) has become a critical component of the defense of organizations against the steady surge of cyber attacks. M… (see more)alware is one of the most challenging problems for CTI, due to its prevalence, the massive number of variants, and the constantly changing threat actor behaviors. Currently, Malpedia has indexed 2,390 unique malware families, while the AVTEST Institute has recorded more than 166 million new unique malware samples in 2021. There exists a vast number of variants per malware family. Consequently, the signature-based representation of patterns and knowledge of legacy systems can no longer be generalized to detect future malware attacks. Machine learning-based solutions can match more variants. However, as a black-box approach, they lack the explainability and maintainability required by incident response teams.There is thus an urgent need for a data-driven system that can abstract a future-proof, human-friendly, systematic, actionable, and dependable knowledge representation from software artifacts from the past for more effective and insightful malware triage. In this paper, we present the first phenotype-based malware decomposition system for quick malware triage that is effective against malware variants. We define phenotypes as directly observable characteristics such as code fragments, constants, functions, and strings. Malware development rarely starts from scratch, and there are many reused components and code fragments. The target under investigation is decomposed into known phenotypes that are mapped to known malware families, malware behaviors, and Advanced Persistent Threat (APT) groups. The implemented system provides visualizable phenotypes through an interactive tree map, helping the cyber analysts to navigate through the decomposition results. We evaluated our system on 200,000 malware samples, 100,000 benign samples, and a malware family with over 27,284 variants. The results indicate our system is scalable, efficient, and effective against zero-day malware and new variants of known families.