Portrait de Haechan Mark Bong n'est pas disponible

Haechan Mark Bong

Doctorat - Polytechnique
Superviseur⋅e principal⋅e
Co-supervisor
Sujets de recherche
IA appliquée
Modèles de fondation
Navigation robotique autonome
Optimisation
Réseaux de neurones profonds
Robotique
Vision par ordinateur

Publications

BlabberSeg: Semantic Perception for Reliable Open-Vocabulary UAV Safe Landing
Reliable robot autonomy requires semantic perception that remains both informative and fast enough for closed-loop safety decisions. We pres… (voir plus)ent BlabberSeg, an optimized CLIPSeg-based open-vocabulary segmentation pipeline for UAV emergency landing. The method targets semantic reliability under edge constraints by reusing prompt, positional, and image features and deploying floating-point 16 ONNX (TensorRT) inference. In a DOVESEI-based safe-landing workflow, BlabberSeg reaches 16.78Hz on Jetson Orin AGX (64GB), a 927.41% speed increase over the original CLIPSeg (1.81Hz), with limited degradation in segmentation agreement (2.1% relative area difference) and mIoU (9%). At the task level, safe-landing success is preserved (76/100, matching baseline) while mission time is substantially reduced. These results support semantic open-vocabulary perception as a practical component for reliable autonomous landing.
To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble
Simon Roy
Euhid Aman
As robot fleets become more heterogeneous, including humanoids, rovers, quadrupeds, and drones, selecting the right robot for a task becomes… (voir plus) a core systems problem. We study robot skill prediction: mapping a natural-language task description to the physical capabilities required to execute it, such as fly, wheels, legs, surface water, under water and hands. Since labelled data that maps natural-language task descriptions to robot's physical capabilities does not exist, we construct a synthetic task-to-skill dataset using LLM-assisted generation and targeted label auditing. Trained on this data, a ~133M-parameter ensemble of two fine-tuned sentence encoders (mpnet + MiniLM) reaches 83.5% task-to-skill matching on a stratified 200 task dataset, outperforming Kimi K2 (1T MoE) at 72.0%, GPT-OSS-120B at 71.5%, and Llama-4-Scout-17B at 69.0% under the same zero-shot prompt. These results suggest that, for fixed robot skill taxonomies, small specialized models trained on synthetic data can outperform much larger general-purpose LLMs for fleet-level task routing.
Safe Aerial 3D Path Planning for Autonomous UAVs using Magnetic Potential Fields
Safe autonomous Uncrewed Aerial Vehicle (UAV) navigation in urban environments requires real-time path planning that avoids obstacles. MaxCo… (voir plus)nvNet is a potential-field planner that leverages properties of Maxwell's equations to generate a path to the goal without local minima. We extend the 2D MaxConvNet magnetic field planner to 3D, using a convolutional autoencoder to predict obstacle-aware potential fields from LiDAR-derived 101^3 voxel grids. Evaluation across 100 randomized closed-loop trials in two distinct Cosys-AirSim urban environments, a dense night-time cityscape and a suburban district shows a 100% path planning success rate on both maps without retraining. In offline path planning, 3DMaxConvNet produces path lengths comparable to A* on unseen maps while reducing runtime from 0.155--0.17s to 0.087--0.089s, or about 1.7--1.95 times faster than A*. Against RRT*(3k), 3DMaxConvNet achieves similar path quality while reducing planning runtime from 17.2--17.5s to about 0.09s, which is roughly 193--201 times faster than RRT*(3k).
Multi-Robot Decentralized Collaborative SLAM in Planetary Analogue Environments: Dataset, Challenges, and Lessons Learned
Pierre-Yves Lajoie
Karthik Soma
Alice Lemieux-Bourque
Rongge Zhang
Vivek Shankar Varadharajan
Decentralized collaborative simultaneous localization and mapping (C-SLAM) is essential to enable multirobot missions in unknown environment… (voir plus)s without relying on preexisting localization and communication infrastructure. This technology is anticipated to play a key role in the exploration of the Moon, Mars, and other planets. In this article, we share insights and lessons learned from C-SLAM experiments involving three robots operating on a Mars analogue terrain and communicating over an ad hoc network. We examine the impact of limited and intermittent communication on C-SLAM performance, as well as the unique localization challenges posed by planetary-like environments. Additionally, we introduce a novel dataset collected during our experiments, which includes real-time peer-to-peer inter-robot throughput and latency measurements. This dataset aims to support future research on communication-constrained, decentralized multirobot operations.
PEACE: Prompt Engineering Automation for CLIPSeg Enhancement for Safe-Landing Zone Segmentation
Rongge Zhang
Antoine Robillard
Safe landing is essential in robotics applications, from industrial settings to space exploration. As artificial intelligence advances, we h… (voir plus)ave developed PEACE (Prompt Engineering Automation for CLIPSeg Enhancement), a system that automatically generates and refines prompts for identifying landing zones in changing environments. Traditional approaches using fixed prompts for open-vocabulary models struggle with environmental changes and can lead to dangerous outcomes when conditions are not represented in the predefined prompts. PEACE addresses this limitation by dynamically adapting to shifting data distributions. Our key innovation is the dual segmentation of safe and unsafe landing zones, allowing the system to refine the results by removing unsafe areas from potential landing sites. Using only monocular cameras and image segmentation, PEACE can safely guide descent operations from 100 meters to altitudes as low as 20 meters. The testing shows that PEACE significantly outperforms the standard CLIP and CLIPSeg prompting methods, improving the successful identification of safe landing zones from 57% to 92%. We have also demonstrated enhanced performance when replacing CLIPSeg with FastSAM. The complete source code is available as an open-source software 1.
BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation
Ricardo de Azambuja
Real-time aerial image segmentation plays an important role in the environmental perception of Uncrewed Aerial Vehicles (UAVs). We introduce… (voir plus) BlabberSeg, an optimized Vision-Language Model built on CLIPSeg for on-board, real-time processing of aerial images by UAVs. BlabberSeg improves the efficiency of CLIPSeg by reusing prompt and model features, reducing computational overhead while achieving real-time open-vocabulary aerial segmentation. We validated BlabberSeg in a safe landing scenario using the Dynamic Open-Vocabulary Enhanced SafE-Landing with Intelligence (DOVESEI) framework, which uses visual servoing and open-vocabulary segmentation. BlabberSeg reduces computational costs significantly, with a speed increase of 927.41% (16.78 Hz) on a NVIDIA Jetson Orin AGX (64GB) compared with the original CLIPSeg (1.81Hz), achieving real-time aerial segmentation with negligible loss in accuracy (2.1% as the ratio of the correctly segmented area with respect to CLIPSeg). BlabberSeg's source code is open and available online.
Active Semantic Mapping and Pose Graph Spectral Analysis for Robot Exploration
PEACE: Prompt Engineering Automation for CLIPSeg Enhancement in Aerial Robotics
Rongge Zhang
Ricardo de Azambuja
From industrial to space robotics, safe landing is an essential component for flight operations. With the growing interest in artificial int… (voir plus)elligence, we direct our attention to learning based safe landing approaches. This paper extends our previous work, DOVESEI, which focused on a reactive UAV system by harnessing the capabilities of open vocabulary image segmentation. Prompt-based safe landing zone segmentation using an open vocabulary based model is no more just an idea, but proven to be feasible by the work of DOVESEI. However, a heuristic selection of words for prompt is not a reliable solution since it cannot take the changing environment into consideration and detrimental consequences can occur if the observed environment is not well represented by the given prompt. Therefore, we introduce PEACE (Prompt Engineering Automation for CLIPSeg Enhancement), powering DOVESEI to automate the prompt generation and engineering to adapt to data distribution shifts. Our system is capable of performing safe landing operations with collision avoidance at altitudes as low as 20 meters using only monocular cameras and image segmentation. We take advantage of DOVESEI's dynamic focus to circumvent abrupt fluctuations in the terrain segmentation between frames in a video stream. PEACE shows promising improvements in prompt generation and engineering for aerial images compared to the standard prompt used for CLIP and CLIPSeg. Combining DOVESEI and PEACE, our system was able improve successful safe landing zone selections by 58.62% compared to using only DOVESEI. All the source code is open source and available online.