Publications

Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems
Heiko Hoppe
Tobias Enders
Maximilian Schiffer
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests… (see more) or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
MINERS: Multilingual Language Models as Semantic Retrievers
Genta Indra Winata
Ruochen Zhang
Words have been represented in a high-dimensional vector space that encodes their semantic similarities, enabling downstream applications su… (see more)ch as retrieving synonyms, antonyms, and relevant contexts. However, despite recent advances in multilingual language models (LMs), the effectiveness of these models' representations in semantic retrieval contexts has not been comprehensively explored. To fill this gap, this paper introduces the MINERS, a benchmark designed to evaluate the ability of multilingual LMs in semantic retrieval tasks, including bitext mining and classification via retrieval-augmented contexts. We create a comprehensive framework to assess the robustness of LMs in retrieving samples across over 200 diverse languages, including extremely low-resource languages in challenging cross-lingual and code-switching settings. Our results demonstrate that by solely retrieving semantically similar embeddings yields performance competitive with state-of-the-art approaches, without requiring any fine-tuning.
Simulating federated learning for steatosis detection using ultrasound images
Yue Qi
Alexandre Cadrin-Chênevert
Katleen Blanchet
Emmanuel Montagnon
Guy Cloutier
Michael Chassé
An Tang
We aimed to implement four data partitioning strategies evaluated with four federated learning (FL) algorithms and investigate the impact of… (see more) data distribution on FL model performance in detecting steatosis using B-mode US images. A private dataset (153 patients; 1530 images) and a public dataset (55 patient; 550 images) were included in this retrospective study. The datasets contained patients with metabolic dysfunction-associated fatty liver disease (MAFLD) with biopsy-proven steatosis grades and control individuals without steatosis. We employed four data partitioning strategies to simulate FL scenarios and we assessed four FL algorithms. We investigated the impact of class imbalance and the mismatch between the global and local data distributions on the learning outcome. Classification performance was assessed with area under the receiver operating characteristic curve (AUC) on a separate test set. AUCs were 0.93 (95% CI 0.92, 0.94) for source-based partitioning scenario with FedAvg, 0.90 (95% CI 0.89, 0.91) for a centralized model, and 0.83 (95% CI 0.81, 0.85) for a model trained in a single-center scenario. When data was perfectly balanced on the global level and each site had an identical data distribution, the model yielded an AUC of 0.90 (95% CI 0.88, 0.92). When each site contained data exclusively from one single class, irrespective of the global data distribution, the AUC fell in the range of 0.34–0.70. FL applied to B-mode US images provide performance comparable to a centralized model and higher than single-center scenario. Global data imbalance and local data heterogeneity influenced the learning outcome.
Accelerating Digital Twin Calibration with Warm-Start Bayesian Optimization
Amal Feriani
Seowoo Jang
Xue Liu
Digital twins are expected to play an important role in the widespread adaptation of AI-based networking solutions in the real world. The ca… (see more)libration of these virtual replicas is critical to ensure a trustworthy replication of the real environment. This work focuses on the input parameter calibration of radio access network (RAN) simulators using real network performance metrics as supervision signals. Usually, the RAN digital twin is considered a black-box function and each calibration problem is viewed as a standalone search problem. RAN simulators are slow and non-differentiable, often posing as the bottleneck in the execution time for these search problems. In this work, we aim to accelerate the search process by reducing the number of interactions with the simulator by leveraging RAN interactions from previous problems. We present a sequential Bayesian optimization framework that uses information from the past to warm-start the calibration process. Assuming that the network performance exhibits gradual and periodic changes, the stored information can be reused in future calibrations. We test our method across multiple physical sites over one week and show that using the proposed framework, we can obtain better calibration with a smaller number of interactions with the simulator during the search phase.
Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching
Yi Tian Xu
M. Jenkin
Xue Liu
Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental a… (see more)nd regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.
Anomaly Detection for Scalable Task Grouping in Reinforcement Learning-based RAN Optimization
Jimmy Li
Igor Kozlov
Xue Liu
The use of learning-based methods for optimizing cellular radio access networks (RAN) has received increasing attention in recent years. Thi… (see more)s coincides with a rapid increase in the number of cell sites worldwide, driven largely by dramatic growth in cellular network traffic. Training and maintaining learned models that work well across a large number of cell sites has thus become a pertinent problem. This paper proposes a scalable framework for constructing a reinforcement learning policy bank that can perform RAN optimization across a large number of cell sites with varying traffic patterns. Central to our framework is a novel application of anomaly detection techniques to assess the compatibility between sites (tasks) and the policy bank. This allows our framework to intelligently identify when a policy can be reused for a task, and when a new policy needs to be trained and added to the policy bank. Our results show that our approach to compatibility assessment leads to an efficient use of computational resources, by allowing us to construct a performant policy bank without exhaustively training on all tasks, which makes it applicable under real-world constraints.
Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion
Ge Ya Luo
Zhi Hao Luo
Christopher Pal
With recent advances in video prediction, controllable video generation has been attracting more attention. Generating high fidelity videos … (see more)according to simple and flexible conditioning is of particular interest. To this end, we propose a controllable video generation model using pixel level renderings of 2D or 3D bounding boxes as conditioning. In addition, we also create a bounding box predictor that, given the initial and ending frames' bounding boxes, can predict up to 15 bounding boxes per frame for all the frames in a 25-frame clip. We perform experiments across 3 well-known AV video datasets: KITTI, Virtual-KITTI 2 and BDD100k.
NPA: Improving Large-scale Graph Neural Networks with Non-parametric Attention.
Wentao Zhang
Guochen Yan
Yu Shen
Yangyu Tao
Yangyu Tao
Bin CUI
Recent works show great interest in designing Graph Neural Networks (GNNs) that scale to large graphs. While previous work focuses on design… (see more)ing advanced sampling techniques for existing GNNs, the design of non-parametric GNNs, an orthogonal direction for scalable performance, has aroused lots of concerns recently. For example, nearly all top solutions in the Open Graph Benchmark leaderboard are non-parametric GNNs. Despite their high predictive performance and scalability, non-parametric GNNs still face two limitations. First, due to the propagation of over-smoothed features, they suffer from severe performance degradation along with the propagation depth. More importantly, they only consider the graph structure and ignore the feature influence during the non-parametric propagation, leading to sub-optimal propagated features. To address these limitations, we present non-parametric attention (NPA), a plug-and-play module that is compatible with non-parametric GNNs, to get scalable and deep GNNs simultaneously. We have deployed NPA in Tencent with the Angel platform, and we further evaluate NPA on both real-world datasets and large-scale industrial datasets. Experimental results on seven homophilic graphs (including the industrial Tencent Video graph) and five heterophilic graphs demonstrate NPA enjoys high performance -- achieves large performance gain over existing non-parametric GNNs, deeper architecture -- improves non-parametric GNNs with large model depth, and high scalability -- can support large-scale graphs with low time costs. Notably, it achieves state-of-the-art performance on the large ogbn-papers100M dataset.
Optimizing Energy Saving for Wireless Networks Via Offline Decision Transformer
Yi Tian Xu
M. Jenkin
Seowoo Jang
Xue Liu
With the global aim of reducing carbon emissions, energy saving for communication systems has gained tremendous attention. Efficient energy-… (see more)saving solutions are not only required to accommodate the fast growth in communication demand but solutions are also challenged by the complex nature of the load dynamics. Recent reinforcement learning (RL)-based methods have shown promising performance for network optimization problems, such as base station energy saving. However, a major limitation of these methods is the requirement of online exploration of potential solutions using a high-fidelity simulator or the need to perform exploration in a real-world environment. We circumvent this issue by proposing an offline reinforcement learning energy saving (ORES) framework that allows us to learn an efficient control policy using previously collected data. We first deploy a behavior energy-saving policy on base stations and generate a set of interaction experiences. Then, using a robust deep offline reinforcement learning algorithm, we learn an energy-saving control policy based on the collected experiences. Results from experiments conducted on a diverse collection of communication scenarios with different behavior policies showcase the effectiveness of the proposed energy-saving algorithms.
PEOPLEx: PEdestrian Opportunistic Positioning LEveraging IMU, UWB, BLE and WiFi
Pierre-Yves Lajoie
Bobak H. Baghi
Sachini Herath
Francois Hogan
Xue Liu
This paper advances the field of pedestrian localization by introducing a unifying framework for opportunistic positioning based on nonlinea… (see more)r factor graph optimization. While many existing approaches assume constant availability of one or multiple sensing signals, our methodology employs IMU-based pedestrian inertial navigation as the backbone for sensor fusion, opportunistically integrating Ultra-Wideband (UWB), Bluetooth Low Energy (BLE), and WiFi signals when they are available in the environment. The proposed PEOPLEx framework is designed to incorporate sensing data as it becomes available, operating without any prior knowledge about the environment (e.g. anchor locations, radio frequency maps, etc.). Our contributions are twofold: 1) we introduce an opportunistic multi-sensor and real-time pedestrian positioning framework fusing the available sensor measurements; 2) we develop novel factors for adaptive scaling and coarse loop closures, significantly improving the precision of indoor positioning. Experimental validation confirms that our approach achieves accurate localization estimates in real indoor scenarios using commercial smartphones.
Probabilistic Mobility Load Balancing for Multi-band 5G and Beyond Networks
Saria Al Lahham
Ekram Hossain
Xue Liu
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Adaptive gradient-based optimizers, notably Adam, have left their mark in training large-scale deep learning models, offering fast convergen… (see more)ce and robustness to hyperparameter settings. However, they often struggle with generalization, attributed to their tendency to converge to sharp minima in the loss landscape. To address this, we propose a new memory-augmented version of Adam that encourages exploration towards flatter minima by incorporating a buffer of critical momentum terms during training. This buffer prompts the optimizer to overshoot beyond narrow minima, promoting exploration. Through comprehensive analysis in simple settings, we illustrate the efficacy of our approach in increasing exploration and bias towards flatter minima. We empirically demonstrate that it can improve model performance for image classification on ImageNet and CIFAR10/100, language modelling on Penn Treebank, and online learning tasks on TinyImageNet and 5-dataset. Our code is available at https://github.com/chandar-lab/CMOptimizer.