Benjamin Fung

Associate Academic Member

Associate Professor, McGill University, School of Information Studies

McGill University University

Research Topics

AI for Software Engineering

Applied Machine Learning

Cybersecurity

Data Mining

Deep Learning

Information Retrieval

Misinformation

Privacy

Representation Learning

Website

Google Scholar

Biography

Benjamin Fung is a Canada Research Chair in Data Mining for Cybersecurity, as well as a full professor at the School of Information Studies and associate member of the School of Computer Science, McGill University.

Fung serves as an associate editor of IEEE Transactions of Knowledge and Data Engineering and Sustainable Cities and Society. He received his PhD in computing science from Simon Fraser University in 2007.

Dr. Fung has over 150 refereed publications to his credit and and more than 14,000 citations (h-index 57) spanning the fields of data mining, machine learning, privacy, cybersecurity and building engineering. His findings in the fields of data mining for crime investigations and authorship analysis have been reported by the media worldwide.

Publications

Adaptive Integration of Categorical and Multi-relational Ontologies with EHR Data for Medical Concept Embedding

Chin Wang Cheong

Kejing Yin

William K. Cheung

Benjamin Fung

Jonathan Poon

2023-11-14

ACM Transactions on Intelligent Systems and Technology (published)

doi.org

A Systematic Literature Review of Fashion, Sustainability, and Consumption Using a Mixed Methods Approach

Osmud Rahman

Dingtao Hu

Benjamin Fung

With the growing global awareness of the environmental impact of clothing consumption, there has been a notable surge in the publication of … (see more)journal articles dedicated to “fashion sustainability” in the past decade, specifically from 2010 to 2020. However, despite this wealth of research, many studies remain disconnected and fragmented due to varying research objectives, focuses, and approaches. Conducting a systematic literature review with a mixed methods research approach can help identify key research themes, trends, and developmental patterns, while also shedding light on the complexity of fashion, sustainability, and consumption. To enhance the literature review and analytical process, the current systematic literature review employed text mining techniques and bibliometric visualization tools, including RAKE, VOSviewer, and CitNetExplorer. The findings revealed an increase in the number of publications focusing on “fashion and sustainability” between 2010 and 2021. Most studies were predominantly conducted in the United States, with a specific focus on female consumers. Moreover, a greater emphasis was placed on non-sustainable cues rather than the sustainable cues. Additionally, a higher number of case studies was undertaken to investigate three fast-fashion companies. To enhance our knowledge and understanding of this subject, this article highlights several valuable contributions and provides recommendations for future research.

2023-08-10

Sustainability (published)

doi.org

FASHION AND SUSTAINABILITY: A SYSTEMATIC LITERATURE REVIEW

Osmud Rahman

Dingtao Hu

Benjamin Fung

2023-07-30

Global Fashion Management Conference (published)

doi.org

Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

Zhiwei Fu

Steven H. H. Ding

Furkan Alaca

Benjamin Fung

Philippe Charland

The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, co… (see more)de reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.

2023-07-20

ArXiv (preprint)

doi.org

arxiv.org

VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution

Litao Li

Steven H. H. Ding

Yuan Tian

Benjamin Fung

Philippe Charland

Weihan Ou

Leo Song

Congwei Chen

2023-04-14

ACM Transactions on Privacy and Security (published)

doi.org

Deep learning-enabled anomaly detection for IoT systems

Adel Abusitta 0001

Adel Abusitta

Glaucio H.S. Carvalho

Omar Abdel Wahab

Talal Halabi

Benjamin Fung

Saja Al-Mamoori

2023-04-01

Internet of Things (published)

doi.org

Differentially Private Release of Heterogeneous Network for Managing Healthcare Data

Rashid Hussain Khokhar

Benjamin Fung

Farkhund Iqbal

Khalil Al-Hussaeni

Mohammed Hussain

With the increasing adoption of digital health platforms through mobile apps and online services, people have greater flexibility connecting… (see more) with medical practitioners, pharmacists, and laboratories and accessing resources to manage their own health-related concerns. Many healthcare institutions are connecting with each other to facilitate the exchange of healthcare data, with the goal of effective healthcare data management. The contents generated over these platforms are often shared with third parties for a variety of purposes. However, sharing healthcare data comes with the potential risk of exposing patients’ sensitive information to privacy threats. In this article, we address the challenge of sharing healthcare data while protecting patients’ privacy. We first model a complex healthcare dataset using a heterogeneous information network that consists of multi-type entities and their relationships. We then propose DiffHetNet, an edge-based differentially private algorithm, to protect the sensitive links of patients from inbound and outbound attacks in the heterogeneous health network. We evaluate the performance of our proposed method in terms of information utility and efficiency on different types of real-life datasets that can be modeled as networks. Experimental results suggest that DiffHetNet generally yields less information loss and is significantly more efficient in terms of runtime in comparison with existing network anonymization methods. Furthermore, DiffHetNet is scalable to large network datasets.

2023-02-28

ACM Transactions on Knowledge Discovery from Data (published)

doi.org

A Literature Review on Detecting, Verifying, and Mitigating Online Misinformation

Arezo Bodaghi

Ketra A. Schmitt

Pierre Watine

Benjamin Fung

Social media use has transformed communication and made social interaction more accessible. Public microblogs allow people to share and acce… (see more)ss news through existing and social-media-created social connections and access to public news sources. These benefits also create opportunities for the spread of false information. False information online can mislead people, decrease the benefits derived from social media, and reduce trust in genuine news. We divide false information into two categories: unintentional false information, also known as misinformation; and intentionally false information, also known as disinformation and fake news. Given the increasing prevalence of misinformation, it is imperative to address its dissemination on social media platforms. This survey focuses on six key aspects related to misinformation: 1) clarify the definition of misinformation to differentiate it from intentional forms of false information; 2) categorize proposed approaches to manage misinformation into three types: detection, verification, and mitigation; 3) review the platforms and languages for which these techniques have been proposed and tested; 4) describe the specific features that are considered in each category; 5) compare public datasets created to address misinformation and categorize into prelabeled content-only datasets and those including users and their connections; and 6) survey fact-checking websites that can be used to verify the accuracy of information. This survey offers a comprehensive and unprecedented review of misinformation, integrating various methodological approaches, datasets, and content-, user-, and network-based approaches, which will undoubtedly benefit future research in this field.

2023-01-01

IEEE Transactions on Computational Social Systems (published)

doi.org

A Novel Deep Multi-head Attentive Vulnerable Line Detector

Miles Q. Li

Benjamin Fung

Ashita Diwan

2023-01-01

Procedia Computer Science (published)

doi.org

Of Stances, Themes, and Anomalies in COVID-19 Mask-Wearing Tweets

Jwen Fai Low

Benjamin Fung

Farkhund Iqbal

COVID-19 is an opportunity to study public acceptance of a “new” healthcare intervention, universal masking, which unlike vaccination, i… (see more)s mostly alien to the Anglosphere public despite being practiced in ages past. Using a collection of over two million tweets, we studied the ways in which proponents and opponents of masking vied for influence as well as the themes driving the discourse. Pro-mask tweets encouraging others to mask up dominated Twitter early in the pandemic though its continued dominance has been eroded by anti-mask tweets criticizing others for their masking behavior. Engagement, represented by the counts of likes, retweets, and replies, and controversiality and disagreeableness, represented by ratios of the aforementioned counts, favored pro-mask tweets initially but with anti-mask tweets slowly gaining ground. Additional analysis raised the possibility of the platform owners suppressing certain parts of the mask-wearing discussion.

2023-01-01

IEEE Access (published)

doi.org

The Age of Ransomware: A Survey on the Evolution, Taxonomy, and Research Directions

Salwa Razaulla

Claude Fachkha

Christine Markarian

Amjad Gawanmeh

Wathiq Mansoor

Benjamin Fung

Chadi Assi

The proliferation of ransomware has become a significant threat to cybersecurity in recent years, causing significant financial, reputationa… (see more)l, and operational damage to individuals and organizations. This paper aims to provide a comprehensive overview of the evolution of ransomware, its taxonomy, and its state-of-the-art research contributions. We begin by tracing the origins of ransomware and its evolution over time, highlighting the key milestones and major trends. Next, we propose a taxonomy of ransomware that categorizes different types of ransomware based on their characteristics and behavior. Subsequently, we review the existing research over several years in regard to detection, prevention, mitigation, and prediction techniques. Our extensive analysis, based on more than 150 references, has revealed that significant research, specifically 72.8%, has focused on detecting ransomware. However, a lack of emphasis has been placed on predicting ransomware. Additionally, of the studies focused on ransomware detection, a significant portion, 70%, have utilized Machine Learning methods. This study uncovers a range of shortcomings in research pertaining to real-time protection and identifying zero-day ransomware, and two issues specific to Machine Learning models. Adversarial machine learning exploitation and concept drift have been identified as under-researched areas in the field. This survey is a constructive roadmap for researchers interested in ransomware research matters.

2023-01-01

IEEE Access (published)

doi.org

In-Processing Fairness Improvement Methods for Regression Data-Driven Building Models: Achieving Uniform Energy Prediction

Ying Sun

Benjamin Fung

Fariborz Haghighat

2022-12-01

Energy and Buildings (published)

doi.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Benjamin Fung

Biography

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Benjamin Fung

Biography

Publications