Portrait of Benjamin Fung

Benjamin Fung

Associate Academic Member
Associate Professor, McGill University, School of Information Studies
Research Topics
Data Mining

Biography

Benjamin Fung is a Canada Research Chair in Data Mining for Cybersecurity, as well as a full professor at the School of Information Studies and associate member of the School of Computer Science, McGill University.

Fung serves as an associate editor of IEEE Transactions of Knowledge and Data Engineering and Sustainable Cities and Society. He received his PhD in computing science from Simon Fraser University in 2007.

Dr. Fung has over 150 refereed publications to his credit and and more than 14,000 citations (h-index 57) spanning the fields of data mining, machine learning, privacy, cybersecurity and building engineering. His findings in the fields of data mining for crime investigations and authorship analysis have been reported by the media worldwide.

Publications

Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection
Litao Li
Steven H. H. Ding
Andrew Walenstein
Philippe Charland
Multidomain Object Detection Framework Using Feature Domain Knowledge Distillation.
Da-Wei Jaw
Shih-Chia Huang
Zhihui Lu
Sy-Yen Kuo
Object detection techniques have been widely studied, utilized in various works, and have exhibited robust performance on images with suffic… (see more)ient luminance. However, these approaches typically struggle to extract valuable features from low-luminance images, which often exhibit blurriness and dim appearence, leading to detection failures. To overcome this issue, we introduce an innovative unsupervised feature domain knowledge distillation (KD) framework. The proposed framework enhances the generalization capability of neural networks across both low-and high-luminance domains without incurring additional computational costs during testing. This improvement is made possible through the integration of generative adversarial networks and our proposed unsupervised KD process. Furthermore, we introduce a region-based multiscale discriminator designed to discern feature domain discrepancies at the object level rather than from the global context. This bolsters the joint learning process of object detection and feature domain distillation tasks. Both qualitative and quantitative assessments shown that the proposed method, empowered by the region-based multiscale discriminator and the unsupervised feature domain distillation process, can effectively extract beneficial features from low-luminance images, outperforming other state-of-the-art approaches in both low-and sufficient-luminance domains.
Survey on Explainable AI: Techniques, challenges and open issues
Adel Abusitta
Miles Q. Li
VulEXplaineR: XAI for Vulnerability Detection on Assembly Code
Samaneh Mahdavifar
Mohd Saqib
Philippe Charland
Andrew Walenstein
Technological Solutions to Online Toxicity: Potential and Pitfalls
Arezo Bodaghi
Ketra A. Schmitt
Social media platforms present a perplexing duality, acting at once as sites to build community and a sense of belonging, while also giving … (see more)rise to misinformation, facilitating and intensifying disinformation campaigns and perpetuating existing patterns of discrimination from the physical world. The first-step platforms take in mitigating the harmful side of social media involves identifying and managing toxic content. Users produce an enormous volume of posts which must be evaluated very quickly. This is an application context that requires machine-learning (ML) tools, but as we detail in this article, ML approaches rely on human annotators, analysts, and moderators. Our review of existing methods and potential improvements indicates that neither humans nor ML can be removed from this process in the near future. However, we see room for improvement in the working conditions of these human workers.
Adaptive Integration of Categorical and Multi-relational Ontologies with EHR Data for Medical Concept Embedding
Chin Wang Cheong
Kejing Yin
William K. Cheung
Jonathan Poon
A Systematic Literature Review of Fashion, Sustainability, and Consumption Using a Mixed Methods Approach
Osmud Rahman
Dingtao Hu
With the growing global awareness of the environmental impact of clothing consumption, there has been a notable surge in the publication of … (see more)journal articles dedicated to “fashion sustainability” in the past decade, specifically from 2010 to 2020. However, despite this wealth of research, many studies remain disconnected and fragmented due to varying research objectives, focuses, and approaches. Conducting a systematic literature review with a mixed methods research approach can help identify key research themes, trends, and developmental patterns, while also shedding light on the complexity of fashion, sustainability, and consumption. To enhance the literature review and analytical process, the current systematic literature review employed text mining techniques and bibliometric visualization tools, including RAKE, VOSviewer, and CitNetExplorer. The findings revealed an increase in the number of publications focusing on “fashion and sustainability” between 2010 and 2021. Most studies were predominantly conducted in the United States, with a specific focus on female consumers. Moreover, a greater emphasis was placed on non-sustainable cues rather than the sustainable cues. Additionally, a higher number of case studies was undertaken to investigate three fast-fashion companies. To enhance our knowledge and understanding of this subject, this article highlights several valuable contributions and provides recommendations for future research.
FASHION AND SUSTAINABILITY: A SYSTEMATIC LITERATURE REVIEW
Osmud Rahman
Dingtao Hu
Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck
Zhiwei Fu
Steven H. H. Ding
Furkan Alaca
Philippe Charland
The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, co… (see more)de reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.
VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution
Litao Li
Steven H. H. Ding
Yuan Tian
Philippe Charland
Weihan Ou
Leo Song
Congwei Chen
Deep learning-enabled anomaly detection for IoT systems
Adel Abusitta 0001
Adel Abusitta
Glaucio H.S. Carvalho
Omar Abdel Wahab
Talal Halabi
Saja Al-Mamoori
Differentially Private Release of Heterogeneous Network for Managing Healthcare Data
Rashid Hussain Khokhar
Farkhund Iqbal
Khalil Al-Hussaeni
Mohammed Hussain
With the increasing adoption of digital health platforms through mobile apps and online services, people have greater flexibility connecting… (see more) with medical practitioners, pharmacists, and laboratories and accessing resources to manage their own health-related concerns. Many healthcare institutions are connecting with each other to facilitate the exchange of healthcare data, with the goal of effective healthcare data management. The contents generated over these platforms are often shared with third parties for a variety of purposes. However, sharing healthcare data comes with the potential risk of exposing patients’ sensitive information to privacy threats. In this article, we address the challenge of sharing healthcare data while protecting patients’ privacy. We first model a complex healthcare dataset using a heterogeneous information network that consists of multi-type entities and their relationships. We then propose DiffHetNet, an edge-based differentially private algorithm, to protect the sensitive links of patients from inbound and outbound attacks in the heterogeneous health network. We evaluate the performance of our proposed method in terms of information utility and efficiency on different types of real-life datasets that can be modeled as networks. Experimental results suggest that DiffHetNet generally yields less information loss and is significantly more efficient in terms of runtime in comparison with existing network anonymization methods. Furthermore, DiffHetNet is scalable to large network datasets.