Portrait of Giovanni Beltrame

Giovanni Beltrame

Affiliate Member
Full Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Research Topics
Autonomous Robotics Navigation
Computer Vision
Distributed Systems
Human-Robot Interaction
Online Learning
Reinforcement Learning
Robotics
Swarm Intelligence

Biography

Giovanni Beltrame obtained his PhD in computer engineering from Politecnico di Milano in 2006, after which he worked as a microelectronics engineer at the European Space Agency on a number of projects, from radiation-tolerant systems to computer-aided design.

In 2010, he moved to Montréal, where he is currently a professor at Polytechnique Montréal in the Computer and Software Engineering Department.

Beltrame directs the Making Innovative Space Technology (MIST) Lab, where he has more than twenty-five students and postdocs under his supervision. He has completed several projects in collaboration with industry and government agencies in the area of robotics, disaster response and space exploration. He and his team have participated in several field missions with ESA, the Canadian Space Agency (CSA) and NASA, including BRAILLE, PANAGAEA-X and IGLUNA.

His research interests include the modelling and design of embedded systems, AI and robotics, and he has published his findings in top journals and conferences.

Current Students

PhD - Polytechnique Montréal
Collaborating researcher - Polytechnique Montréal Montreal
Master's Research - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal

Publications

Learning Multi-agent Multi-machine Tending by Mobile Robots
Abdalwhab Abdalwhab
David St-Onge
Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborat… (see more)ive robots can tackle that can also highly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. In this work, we introduce a multi-agent multi-machine tending learning framework by mobile robots based on Multi-agent Reinforcement Learning (MARL) techniques with the design of a suitable observation and reward. Moreover, an attention-based encoding mechanism is developed and integrated into Multi-agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine tending scenarios. Our model (AB-MAPPO) outperformed MAPPO in this new challenging scenario in terms of task success, safety, and resources utilization. Furthermore, we provided an extensive ablation study to support our various design decisions.
GNN-based Decentralized Perception in Multirobot Systems for Predicting Worker Actions
Ali Imran
David St-Onge
In industrial environments, predicting human actions is essential for ensuring safe and effective collaboration between humans and robots. T… (see more)his paper introduces a perception framework that enables mobile robots to understand and share information about human actions in a decentralized way. The framework first allows each robot to build a spatial graph representing its surroundings, which it then shares with other robots. This shared spatial data is combined with temporal information to track human behavior over time. A swarm-inspired decision-making process is used to ensure all robots agree on a unified interpretation of the human's actions. Results show that adding more robots and incorporating longer time sequences improve prediction accuracy. Additionally, the consensus mechanism increases system resilience, making the multi-robot setup more reliable in dynamic industrial settings.
GNN-based Decentralized Perception in Multirobot Systems for Predicting Worker Actions
Ali Imran
David St-Onge
In industrial environments, predicting human actions is essential for ensuring safe and effective collaboration between humans and robots. T… (see more)his paper introduces a perception framework that enables mobile robots to understand and share information about human actions in a decentralized way. The framework first allows each robot to build a spatial graph representing its surroundings, which it then shares with other robots. This shared spatial data is combined with temporal information to track human behavior over time. A swarm-inspired decision-making process is used to ensure all robots agree on a unified interpretation of the human's actions. Results show that adding more robots and incorporating longer time sequences improve prediction accuracy. Additionally, the consensus mechanism increases system resilience, making the multi-robot setup more reliable in dynamic industrial settings.
A Multi-Robot Exploration Planner for Space Applications
Vivek Shankar Vardharajan
We propose a distributed multi-robot exploration planning method designed for complex, unconstrained environments featuring steep elevation … (see more)changes. The method employs a two-tiered approach: a local exploration planner that constructs a grid graph to maximize exploration gain and a global planner that maintains a sparse navigational graph to track visited locations and frontier information. The global graphs are periodically synchronized among robots within communication range to maintain an updated representation of the environment. Our approach integrates localization loop closure estimates to correct global graph drift. In simulation and field tests, the proposed method achieves 50% lower computational runtime compared to state-of-the-art methods while demonstrating superior exploration coverage. We evaluate its performance in two simulated subterranean environments and in field experiments at a Mars-analog terrain.
A Multi-Robot Exploration Planner for Space Applications
Vivek Shankar Vardharajan
BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation
Haechan Mark Bong
Ricardo de Azambuja
Real-time aerial image segmentation plays an important role in the environmental perception of Uncrewed Aerial Vehicles (UAVs). We introduce… (see more) BlabberSeg, an optimized Vision-Language Model built on CLIPSeg for on-board, real-time processing of aerial images by UAVs. BlabberSeg improves the efficiency of CLIPSeg by reusing prompt and model features, reducing computational overhead while achieving real-time open-vocabulary aerial segmentation. We validated BlabberSeg in a safe landing scenario using the Dynamic Open-Vocabulary Enhanced SafE-Landing with Intelligence (DOVESEI) framework, which uses visual servoing and open-vocabulary segmentation. BlabberSeg reduces computational costs significantly, with a speed increase of 927.41% (16.78 Hz) on a NVIDIA Jetson Orin AGX (64GB) compared with the original CLIPSeg (1.81Hz), achieving real-time aerial segmentation with negligible loss in accuracy (2.1% as the ratio of the correctly segmented area with respect to CLIPSeg). BlabberSeg's source code is open and available online.
Active Semantic Mapping and Pose Graph Spectral Analysis for Robot Exploration
Rongge Zhang
Haechan Mark Bong
Physical Simulation for Multi-agent Multi-machine Tending
Abdalwhab Abdalwhab
David St-Onge
Multi-Objective Risk Assessment Framework for Exploration Planning Using Terrain and Traversability Analysis
Riana Gagnon Souleiman
Vivek Shankar Vardharajan
Frequency-based View Selection in Gaussian Splatting Reconstruction
Monica Li
Pierre-Yves Lajoie
Three-dimensional reconstruction is a fundamental problem in robotics perception. We examine the problem of active view selection to perform… (see more) 3D Gaussian Splatting reconstructions with as few input images as possible. Although 3D Gaussian Splatting has made significant progress in image rendering and 3D reconstruction, the quality of the reconstruction is strongly impacted by the selection of 2D images and the estimation of camera poses through Structure-from-Motion (SfM) algorithms. Current methods to select views that rely on uncertainties from occlusions, depth ambiguities, or neural network predictions directly are insufficient to handle the issue and struggle to generalize to new scenes. By ranking the potential views in the frequency domain, we are able to effectively estimate the potential information gain of new viewpoints without ground truth data. By overcoming current constraints on model architecture and efficacy, our method achieves state-of-the-art results in view selection, demonstrating its potential for efficient image-based 3D reconstruction.
Swarming Out of the Lab: Comparing Relative Localization Methods for Collective Behavior
Rafael Gomes Braga
Vivek Shankar Vardharajan
David St-Onge
Active Semantic Mapping and Pose Graph Spectral Analysis for Robot Exploration
Rongge Zhang
Haechan Mark Bong
Exploration in unknown and unstructured environments is a pivotal requirement for robotic applications. A robot’s exploration behavior can… (see more) be inherently affected by the performance of its Simultaneous Localization and Mapping (SLAM) subsystem, although SLAM and exploration are generally studied separately. In this paper, we formulate exploration as an active mapping problem and extend it with semantic information. We introduce a novel active metric-semantic SLAM approach, leveraging recent research advances in information theory and spectral graph theory: we combine semantic mutual information and the connectivity metrics of the underlying pose graph of the SLAM subsystem. We use the resulting utility function to evaluate different trajectories to select the most favorable strategy during exploration. Exploration and SLAM metrics are analyzed in experiments. Running our algorithm on the Habitat dataset, we show that, while maintaining efficiency close to the state-of-the-art exploration methods, our approach effectively increases the performance of metric-semantic SLAM with a 21% reduction in average map error and a 9% improvement in average semantic classification accuracy.