Chengming Hu

Self-Refined Generative Foundation Models for Wireless Traffic Prediction

Hao Zhou

Xi Chen

Jun Yan

Xue Liu

With a broad range of emerging applications in 6G networks, wireless traffic prediction has become a critical component of network managemen… (see more)t. However, the dynamically shifting distribution of wireless traffic in non-stationary 6G networks presents significant challenges to achieving accurate and stable predictions. Motivated by recent advancements in Generative AI (GenAI)-enabled 6G networks, this paper proposes a novel self-refined Large Language Model (LLM) for wireless traffic prediction, namely TrafficLLM, through in-context learning without parameter fine-tuning or model training. The proposed TrafficLLM harnesses the powerful few-shot learning abilities of LLMs to enhance the scalability of traffic prediction in dynamically changing wireless environments. Specifically, our proposed TrafficLLM embraces an LLM to iteratively refine its predictions through a three-step process: traffic prediction, feedback generation, and prediction refinement. Initially, the proposed TrafficLLM conducts traffic predictions using task-specific demonstration prompts. Recognizing that LLMs may generate incorrect predictions on the first attempt, this paper designs feedback demonstration prompts to provide multifaceted and valuable feedback related to these initial predictions. The validation scheme is further incorporated to systematically enhance the accuracy of mathematical calculations during the feedback generation process. Following this comprehensive feedback, our proposed TrafficLLM introduces refinement demonstration prompts, enabling the same LLM to further refine its predictions and thereby enhance prediction performance. Evaluations on two realistic datasets demonstrate that the proposed TrafficLLM outperforms LLM-based in-context learning methods, achieving performance improvements of 23.17% and 17.09%, respectively.

2024-12-31

IEEE Transactions on Vehicular Technology (published)

doi.org

arxiv.org

Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning

Hao Zhou

Xue Liu

Zhu Han

Jianzhong (Charlie) Zhang

Generative artificial intelligence (GAI) is a promising technique towards 6G networks, and generative foundation models such as large langua… (see more)ge models (LLMs) have attracted considerable interest from academia and telecom industry. This work considers a novel edge-cloud deployment of foundation models in 6G networks. Specifically, it aims to minimize the service delay of foundation models by radio resource allocation and task offloading, i.e., offloading diverse content generation tasks to proper LLMs at the network edge or cloud. In particular, we first introduce the communication system model, i.e., allocating radio resources and calculating link capacity to support generated content transmission, and then we present the LLM inference model to calculate the delay of content generation. After that, we propose a novel in-context learning method to optimize the task offloading decisions. It utilizes LLM's inference capabilities, and avoids the difficulty of dedicated model training or fine-tuning as in conventional machine learning algorithms. Finally, the simulations demonstrate that the proposed edge-cloud deployment and in-context learning task offloading method can achieve satisfactory generation service quality without dedicated model training or fine-tuning.

2024-12-22

IEEE Wireless Communications Letters (published)

doi.org

arxiv.org

Less or More from Teacher: Exploiting Trilateral Geometry for Knowledge Distillation

Xi Chen

Boyu Wang

Jun Yan

Xue Liu

Knowledge distillation aims to train a compact student network using soft supervision from a larger teacher network and hard supervision fro… (see more)m ground truths. However, determining an optimal knowledge fusion ratio that balances these supervisory signals remains challenging. Prior methods generally resort to a constant or heuristic-based fusion ratio, which often falls short of a proper balance. In this study, we introduce a novel adaptive method for learning a sample-wise knowledge fusion ratio, exploiting both the correctness of teacher and student, as well as how well the student mimics the teacher on each sample. Our method naturally leads to the intra-sample trilateral geometric relations among the student prediction (

2023-12-31

ICLR (published)

doi.org

openreview.net

AdaTeacher: Adaptive Multi-Teacher Weighting for Communication Load Forecasting

Chengming Hu

Ju Wang

Di Wu

Yan Xin

Charlie Zhang

Xue Liu

Gregory Dudek

To deal with notorious delays in communication systems, it is crucial to forecast key system characteristics, such as the communication load… (see more). Most existing studies aggregate data from multiple edge nodes for improving the forecasting accuracy. However, the bandwidth cost of such data aggregation could be unacceptably high from the perspective of system operators. To achieve both the high forecasting accuracy and bandwidth efficiency, this paper proposes an Adaptive Multi-Teacher Weighting in Teacher-Student Learning approach, namely AdaTeacher, for communication load forecasting of multiple edge nodes. Each edge node trains a local model on its own data. A target node collects multiple models from its neighbor nodes and treats these models as teachers. Then, the target node trains a student model from teachers via Teacher-Student (T-S) learning. Unlike most existing T-S learning approaches that treat teachers evenly, resulting in a limited performance, AdaTeacher introduces a bilevel optimization algorithm to dynamically learn an importance weight for each teacher toward a more effective and accurate T-S learning process. Compared to the state-of-the-art methods, Ada Teacher not only reduces the bandwidth cost by 53.85%, but also improves the load forecasting accuracy by 21.56% and 24.24% on two real-world datasets.

2023-12-03

Global Communications Conference (published)

doi.org

Teacher-Student Architecture for Knowledge Distillation: A Survey

Chengming Hu

Xuan Li

Danyang Liu

Haolun Wu

X. T. Chen

Ju Wang

Xue Liu

Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be depl… (see more)oyed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.

2023-08-07

ArXiv (preprint)

doi.org

arxiv.org

Reinforcement Learning-Based Adaptive Feature Boosting for Smart Grid Intrusion Detection

Chengming Hu

Jun Yan

Xue Liu

Intrusion detection systems (IDSs) are crucial in the security monitoring for the smart grid with increasing machine-to-machine communicatio… (see more)ns and cyber threats thereafter. However, the multi-sourced, correlated, and heterogeneous smart grid data pose significant challenges to the accurate attack detection by IDSs. To improve the attack detection, this paper proposes Reinforcement Learning-based Adaptive Feature Boosting, which aims to leverage a series of AutoEncoders (AEs) to capture critical features from the multi-sourced smart grid data for the classification of normal, fault, and attack events. Multiple AEs are utilized to extract representative features from different feature sets that are automatically generated through a weighted feature sampling process; each AE-extracted feature set is then applied to build a Random Forest (RF) base classifier. In the feature sampling process, Deep Deterministic Policy Gradient (DDPG) is introduced to dynamically determine the feature sampling probability based on the classification accuracy. The critical features that improve the classification accuracy are assigned larger sampling probabilities and increasingly participate in the training of next AE. The presence of critical features is increased in the event classification over the multi-sourced smart grid data. Considering potential different alarms among base classifiers, an ensemble classifier is further built to distinguish normal, fault, and attack events. Our proposed approach is evaluated on the two realistic datasets collected from Hardware-In-the-Loop (HIL) and WUSTIL-IIOT-2021 security testbeds, respectively. The evaluation on the HIL security dataset shows that our proposed approach achieves the classification accuracy with 97.28%, an effective 5.5% increase over the vanilla Adaptive Feature Boosting. Moreover, the proposed approach not only accurately and stably selects critical features on the WUSTIL-IIOT-2021 dataset based on the significant difference of feature sampling probabilities between critical and uncritical features, i.e., the probabilities greater than 0.08 and less than 0.01, but also outperforms the other best-performing approaches with the increasing Matthew Correlation Coefficient (MCC) of 8.03%.

2023-06-30

IEEE Transactions on Smart Grid (published)

doi.org

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Chengming Hu

Publications

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Popular keywords:

Chengming Hu

Publications