Gregory Dudek

2025-02-01

IEEE Internet of Things Journal (published)

AIoT Smart Home via Autonomous LLM Agents

Dmitriy Rivkin

Francois Hogan

Amal Feriani

Abhisek Konar

Adam Sigal

The common-sense reasoning abilities and vast general knowledge of large language models (LLMs) make them a natural fit for interpreting use… (see more)r requests in a smart home assistant context. LLMs, however, lack specific knowledge about the user and their home, which limits their potential impact. Smart home agent with grounded execution (SAGE), overcomes these and other limitations by using a scheme in which a user request triggers an LLM-controlled sequence of discrete actions. These actions can be used to retrieve information, interact with the user, or manipulate device states. SAGE controls this process through a dynamically constructed tree of LLM prompts, which help it decide which action to take next, whether an action was successful, and when to terminate the process. The SAGE action set augments an LLM’s capabilities to support some of the most critical requirements for a smart home assistant. These include: flexible and scalable user preference management (“Is my team playing tonight?”), access to any smart device’s full functionality without device-specific code via API reading (“Turn down the screen brightness on my dryer”), persistent device state monitoring (“Remind me to throw out the milk when I open the fridge”), natural device references using only a photo of the room (“Turn on the lamp on the dresser”), and more. We introduce a benchmark of 50 new and challenging smart home tasks where SAGE achieves a 76% success rate, significantly outperforming existing LLM-enabled baselines (30% success rate).

2025-02-01

IEEE Internet of Things Journal (published)

Visual-Tactile Inference of 2.5D Object Shape From Marker Texture

Affan Jilani

Francois Hogan

Charlotte Morissette

M. Jenkin

Kaleem Siddiqi

Visual-tactile sensing affords abundant capabilities for contact-rich object manipulation tasks including grasping and placing. Here we intr… (see more)oduce a shape-from-texture inspired contact shape estimation approach for visual-tactile sensors equipped with visually distinct membrane markers. Under a perspective projection camera model, measurements related to the change in marker separation upon contact are used to recover surface shape. Our approach allows for shape sensing in real time, without requiring network training or complex assumptions related to lighting, sensor geometry or marker placement. Experiments show that the surface contact shape recovered is qualitatively and quantitatively consistent with those obtained through the use of photometric stereo, the current state of the art for shape recovery in visual-tactile sensors. Importantly, our approach is applicable to a large family of sensors not equipped with photometric stereo hardware, and also to those with semi-transparent membranes. The recovery of surface shape affords new capabilities to these sensors for robotic applications, such as the estimation of contact and slippage in object manipulation tasks (Hogan etal., 2022) and the use of force matching for kinesthetic teaching using multimodal visual-tactile sensing (Ablett etal., 2024).

2025-02-01

IEEE Robotics and Automation Letters (published)

Visual-Tactile Inference of 2.5D Object Shape From Marker Texture

Affan Jilani

Francois Hogan

Charlotte Morissette

M. Jenkin

Kaleem Siddiqi

Visual-tactile sensing affords abundant capabilities for contact-rich object manipulation tasks including grasping and placing. Here we intr… (see more)oduce a shape-from-texture inspired contact shape estimation approach for visual-tactile sensors equipped with visually distinct membrane markers. Under a perspective projection camera model, measurements related to the change in marker separation upon contact are used to recover surface shape. Our approach allows for shape sensing in real time, without requiring network training or complex assumptions related to lighting, sensor geometry or marker placement. Experiments show that the surface contact shape recovered is qualitatively and quantitatively consistent with those obtained through the use of photometric stereo, the current state of the art for shape recovery in visual-tactile sensors. Importantly, our approach is applicable to a large family of sensors not equipped with photometric stereo hardware, and also to those with semi-transparent membranes. The recovery of surface shape affords new capabilities to these sensors for robotic applications, such as the estimation of contact and slippage in object manipulation tasks (Hogan etal., 2022) and the use of force matching for kinesthetic teaching using multimodal visual-tactile sensing (Ablett etal., 2024).

2025-02-01

IEEE Robotics and Automation Letters (published)

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor

Trevor Ablett

Oliver Limoyo

Adam Sigal

Affan Jilani

Jonathan Kelly

Kaleem Siddiqi

Francois Hogan

Kinesthetic Teaching is a popular approach to collecting expert robotic demonstrations of contact-rich tasks for imitation learning (IL), bu… (see more)t it typically only measures motion, ignoring the force placed on the environment by the robot. Furthermore, contact-rich tasks require accurate sensing of both reaching and touching, which can be difficult to provide with conventional sensing modalities. We address these challenges with a See-Through-your-Skin (STS) visuotactile sensor, using the sensor both (i) as a measurement tool to improve kinesthetic teaching, and (ii) as a policy input in contact-rich door manipulation tasks. An STS sensor can be switched between visual and tactile modes by leveraging a semi-transparent surface and controllable lighting, allowing for both pre-contact visual sensing and during-contact tactile sensing with a single sensor. First, we propose tactile force matching, a methodology that enables a robot to match forces read during kinesthetic teaching using tactile signals. Second, we develop a policy that controls STS mode switching, allowing a policy to learn the appropriate moment to switch an STS from its visual to its tactile mode. Finally, we study multiple observation configurations to compare and contrast the value of visual and tactile data from an STS with visual data from a wrist-mounted eye-in-hand camera. With over 3,000 test episodes from real-world manipulation experiments, we find that the inclusion of force matching raises average policy success rates by 62.5%, STS mode switching by 30.3%, and STS data as a policy input by 42.5%. Our results highlight the utility of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.

2025-01-01

IEEE Transactions on Robotics (published)

PhotoBot: Reference-Guided Interactive Photography via Natural Language

Oliver Limoyo

Jimmy Li

Dmitriy Rivkin

Jonathan Kelly

We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance an… (see more)d a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

Working Backwards: Learning to Place by Picking

Oliver Limoyo

Abhisek Konar

Trevor Ablett

Jonathan Kelly

Francois Hogan

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

Accelerating Digital Twin Calibration with Warm-Start Bayesian Optimization

Abhisek Konar

Amal Feriani

Di Wu

Seowoo Jang

Digital twins are expected to play an important role in the widespread adaptation of AI-based networking solutions in the real world. The ca… (see more)libration of these virtual replicas is critical to ensure a trustworthy replication of the real environment. This work focuses on the input parameter calibration of radio access network (RAN) simulators using real network performance metrics as supervision signals. Usually, the RAN digital twin is considered a black-box function and each calibration problem is viewed as a standalone search problem. RAN simulators are slow and non-differentiable, often posing as the bottleneck in the execution time for these search problems. In this work, we aim to accelerate the search process by reducing the number of interactions with the simulator by leveraging RAN interactions from previous problems. We present a sequential Bayesian optimization framework that uses information from the past to warm-start the calibration process. Assuming that the network performance exhibits gradual and periodic changes, the stored information can be reused in future calibrations. We test our method across multiple physical sites over one week and show that using the proposed framework, we can obtain better calibration with a smaller number of interactions with the simulator during the search phase.

2024-06-09

ICC 2024 - IEEE International Conference on Communications (published)

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching

Junliang Luo

Yi Tian Xu

Di Wu

M. Jenkin

Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental a… (see more)nd regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.

2024-06-09

2024 IEEE International Conference on Communications Workshops (ICC Workshops) (published)

Anomaly Detection for Scalable Task Grouping in Reinforcement Learning-based RAN Optimization

Jimmy Li

Igor Kozlov

Di Wu

The use of learning-based methods for optimizing cellular radio access networks (RAN) has received increasing attention in recent years. Thi… (see more)s coincides with a rapid increase in the number of cell sites worldwide, driven largely by dramatic growth in cellular network traffic. Training and maintaining learned models that work well across a large number of cell sites has thus become a pertinent problem. This paper proposes a scalable framework for constructing a reinforcement learning policy bank that can perform RAN optimization across a large number of cell sites with varying traffic patterns. Central to our framework is a novel application of anomaly detection techniques to assess the compatibility between sites (tasks) and the policy bank. This allows our framework to intelligently identify when a policy can be reused for a task, and when a new policy needs to be trained and added to the policy bank. Our results show that our approach to compatibility assessment leads to an efficient use of computational resources, by allowing us to construct a performant policy bank without exhaustively training on all tasks, which makes it applicable under real-world constraints.

2024-06-09

2024 IEEE International Conference on Communications Workshops (ICC Workshops) (published)

Optimizing Energy Saving for Wireless Networks Via Offline Decision Transformer

Yi Tian Xu

Di Wu

M. Jenkin

Seowoo Jang

With the global aim of reducing carbon emissions, energy saving for communication systems has gained tremendous attention. Efficient energy-… (see more)saving solutions are not only required to accommodate the fast growth in communication demand but solutions are also challenged by the complex nature of the load dynamics. Recent reinforcement learning (RL)-based methods have shown promising performance for network optimization problems, such as base station energy saving. However, a major limitation of these methods is the requirement of online exploration of potential solutions using a high-fidelity simulator or the need to perform exploration in a real-world environment. We circumvent this issue by proposing an offline reinforcement learning energy saving (ORES) framework that allows us to learn an efficient control policy using previously collected data. We first deploy a behavior energy-saving policy on base stations and generate a set of interaction experiences. Then, using a robust deep offline reinforcement learning algorithm, we learn an energy-saving control policy based on the collected experiences. Results from experiments conducted on a diverse collection of communication scenarios with different behavior policies showcase the effectiveness of the proposed energy-saving algorithms.

2024-06-09

ICC 2024 - IEEE International Conference on Communications (published)

PEOPLEx: PEdestrian Opportunistic Positioning LEveraging IMU, UWB, BLE and WiFi

Pierre-Yves Lajoie

Bobak H. Baghi

Sachini Herath

Francois Hogan

This paper advances the field of pedestrian localization by introducing a unifying framework for opportunistic positioning based on nonlinea… (see more)r factor graph optimization. While many existing approaches assume constant availability of one or multiple sensing signals, our methodology employs IMU-based pedestrian inertial navigation as the backbone for sensor fusion, opportunistically integrating Ultra-Wideband (UWB), Bluetooth Low Energy (BLE), and WiFi signals when they are available in the environment. The proposed PEOPLEx framework is designed to incorporate sensing data as it becomes available, operating without any prior knowledge about the environment (e.g. anchor locations, radio frequency maps, etc.). Our contributions are twofold: 1) we introduce an opportunistic multi-sensor and real-time pedestrian positioning framework fusing the available sensor measurements; 2) we develop novel factors for adaptive scaling and coarse loop closures, significantly improving the precision of indoor positioning. Experimental validation confirms that our approach achieves accurate localization estimates in real indoor scenarios using commercial smartphones.

2024-06-09

ICC 2024 - IEEE International Conference on Communications (published)