Portrait de Chengming Hu n'est pas disponible

Chengming Hu

Doctorat - McGill University
Superviseur⋅e principal⋅e

Publications

Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation
Chengming Hu
Haolun Wu
Xuan Li
Chen Ma
Xi Chen
Jun Yan
Boyu Wang
AdaTeacher: Adaptive Multi-Teacher Weighting for Communication Load Forecasting
Chengming Hu
Ju Wang
Di Wu
Yan Xin
Charlie Zhang
To deal with notorious delays in communication systems, it is crucial to forecast key system characteristics, such as the communication load… (voir plus). Most existing studies aggregate data from multiple edge nodes for improving the forecasting accuracy. However, the bandwidth cost of such data aggregation could be unacceptably high from the perspective of system operators. To achieve both the high forecasting accuracy and bandwidth efficiency, this paper proposes an Adaptive Multi-Teacher Weighting in Teacher-Student Learning approach, namely AdaTeacher, for communication load forecasting of multiple edge nodes. Each edge node trains a local model on its own data. A target node collects multiple models from its neighbor nodes and treats these models as teachers. Then, the target node trains a student model from teachers via Teacher-Student (T-S) learning. Unlike most existing T-S learning approaches that treat teachers evenly, resulting in a limited performance, AdaTeacher introduces a bilevel optimization algorithm to dynamically learn an importance weight for each teacher toward a more effective and accurate T-S learning process. Compared to the state-of-the-art methods, Ada Teacher not only reduces the bandwidth cost by 53.85%, but also improves the load forecasting accuracy by 21.56% and 24.24% on two real-world datasets.
Teacher-Student Architecture for Knowledge Distillation: A Survey
Chengming Hu
Xuan Li
Danyang Liu
Haolun Wu
Xi Chen
Ju Wang
Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be depl… (voir plus)oyed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.
Reinforcement Learning-Based Adaptive Feature Boosting for Smart Grid Intrusion Detection
Chengming Hu
Jun Yan
Intrusion detection systems (IDSs) are crucial in the security monitoring for the smart grid with increasing machine-to-machine communicatio… (voir plus)ns and cyber threats thereafter. However, the multi-sourced, correlated, and heterogeneous smart grid data pose significant challenges to the accurate attack detection by IDSs. To improve the attack detection, this paper proposes Reinforcement Learning-based Adaptive Feature Boosting, which aims to leverage a series of AutoEncoders (AEs) to capture critical features from the multi-sourced smart grid data for the classification of normal, fault, and attack events. Multiple AEs are utilized to extract representative features from different feature sets that are automatically generated through a weighted feature sampling process; each AE-extracted feature set is then applied to build a Random Forest (RF) base classifier. In the feature sampling process, Deep Deterministic Policy Gradient (DDPG) is introduced to dynamically determine the feature sampling probability based on the classification accuracy. The critical features that improve the classification accuracy are assigned larger sampling probabilities and increasingly participate in the training of next AE. The presence of critical features is increased in the event classification over the multi-sourced smart grid data. Considering potential different alarms among base classifiers, an ensemble classifier is further built to distinguish normal, fault, and attack events. Our proposed approach is evaluated on the two realistic datasets collected from Hardware-In-the-Loop (HIL) and WUSTIL-IIOT-2021 security testbeds, respectively. The evaluation on the HIL security dataset shows that our proposed approach achieves the classification accuracy with 97.28%, an effective 5.5% increase over the vanilla Adaptive Feature Boosting. Moreover, the proposed approach not only accurately and stably selects critical features on the WUSTIL-IIOT-2021 dataset based on the significant difference of feature sampling probabilities between critical and uncritical features, i.e., the probabilities greater than 0.08 and less than 0.01, but also outperforms the other best-performing approaches with the increasing Matthew Correlation Coefficient (MCC) of 8.03%.