Portrait of Giovanni Beltrame

Giovanni Beltrame

Affiliate Member
Full Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Research Topics
Autonomous Robotics Navigation
Computer Vision
Distributed Systems
Human-Robot Interaction
Online Learning
Reinforcement Learning
Robotics
Swarm Intelligence

Biography

Giovanni Beltrame obtained his PhD in computer engineering from Politecnico di Milano in 2006, after which he worked as a microelectronics engineer at the European Space Agency on a number of projects, from radiation-tolerant systems to computer-aided design.

In 2010, he moved to Montréal, where he is currently a professor at Polytechnique Montréal in the Computer and Software Engineering Department.

Beltrame directs the Making Innovative Space Technology (MIST) Lab, where he has more than twenty-five students and postdocs under his supervision. He has completed several projects in collaboration with industry and government agencies in the area of robotics, disaster response and space exploration. He and his team have participated in several field missions with ESA, the Canadian Space Agency (CSA) and NASA, including BRAILLE, PANAGAEA-X and IGLUNA.

His research interests include the modelling and design of embedded systems, AI and robotics, and he has published his findings in top journals and conferences.

Current Students

PhD - Polytechnique Montréal
Co-supervisor :
Collaborating researcher - Polytechnique Montréal Montreal
Co-supervisor :
PhD - Polytechnique Montréal
Co-supervisor :
Master's Research - Université de Montréal
Co-supervisor :
PhD - Polytechnique Montréal
Co-supervisor :

Publications

BlabberSeg: Semantic Perception for Reliable Open-Vocabulary UAV Safe Landing
Reliable robot autonomy requires semantic perception that remains both informative and fast enough for closed-loop safety decisions. We pres… (see more)ent BlabberSeg, an optimized CLIPSeg-based open-vocabulary segmentation pipeline for UAV emergency landing. The method targets semantic reliability under edge constraints by reusing prompt, positional, and image features and deploying floating-point 16 ONNX (TensorRT) inference. In a DOVESEI-based safe-landing workflow, BlabberSeg reaches 16.78Hz on Jetson Orin AGX (64GB), a 927.41% speed increase over the original CLIPSeg (1.81Hz), with limited degradation in segmentation agreement (2.1% relative area difference) and mIoU (9%). At the task level, safe-landing success is preserved (76/100, matching baseline) while mission time is substantially reduced. These results support semantic open-vocabulary perception as a practical component for reliable autonomous landing.
To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble
Simon Roy
Euhid Aman
As robot fleets become more heterogeneous, including humanoids, rovers, quadrupeds, and drones, selecting the right robot for a task becomes… (see more) a core systems problem. We study robot skill prediction: mapping a natural-language task description to the physical capabilities required to execute it, such as fly, wheels, legs, surface water, under water and hands. Since labelled data that maps natural-language task descriptions to robot's physical capabilities does not exist, we construct a synthetic task-to-skill dataset using LLM-assisted generation and targeted label auditing. Trained on this data, a ~133M-parameter ensemble of two fine-tuned sentence encoders (mpnet + MiniLM) reaches 83.5% task-to-skill matching on a stratified 200 task dataset, outperforming Kimi K2 (1T MoE) at 72.0%, GPT-OSS-120B at 71.5%, and Llama-4-Scout-17B at 69.0% under the same zero-shot prompt. These results suggest that, for fixed robot skill taxonomies, small specialized models trained on synthetic data can outperform much larger general-purpose LLMs for fleet-level task routing.
Sleep Spindle-Locked Targeted Memory Reactivation Enhances Declarative Memory Consolidation
Vaishali Mutreja
Prakriti Gupta
Ovidiu Lungu
Latifa Lazzouni
Ella Gabitov
Habib Benali
Hugo Jourde
Emily BJ Coffey
Jean-Marc Lina
Geneviève Albouy
Bradley King
Arnaud Boutin
Julie Carrier
Julien Doyon
Abstract Study Objectives Sleep spindles are implicated in memory consolidation. Yet direct evidence linking spindle dynamics to declarative… (see more) memory outcomes remains limited. We thus tested whether targeted memory reactivation (TMR) time-locked to sleep spindles enhances declarative memory, and whether the temporal organization of stimulated spindles–trains versus isolated events–is selectively associated with distinct memory outcomes. Methods Twenty-eight healthy young adults learned image locations from two categories (animals, clothing) in a grid, each paired with a distinct auditory cue. During overnight NREM sleep, one cue was replayed time-locked to spindles detected in real-time using a closed-loop system (TMR condition); the other served as the non-reactivated control (No-TMR condition). Category-cue assignment was counterbalanced. Post-sleep recall, recognition accuracy, and movement time were assessed. Results Recall accuracy was significantly higher in the TMR than the No-TMR condition (93.96% vs. 90.61%, p = .024), whereas recognition accuracy ( p = .139) and movement time ( p = .651) did not differ. Stimulation intensity within spindle trains correlated with the TMR effect on recall (Spearman ρ = .531, p = .004), whereas the proportion of isolated spindle stimulations correlated with the TMR effect on recognition (ρ = .563, p = .002). Cross-associations were not significant. Conclusions Spindle-locked TMR enhances recall-based declarative memory retention. The selective association between spindle temporal clustering and memory outcomes suggests that train-embedded and isolated spindles support different aspects of memory consolidation, highlighting spindle temporal context as a functionally relevant dimension of sleep-dependent memory processing.
Safe Aerial 3D Path Planning for Autonomous UAVs using Magnetic Potential Fields
Safe autonomous Uncrewed Aerial Vehicle (UAV) navigation in urban environments requires real-time path planning that avoids obstacles. MaxCo… (see more)nvNet is a potential-field planner that leverages properties of Maxwell's equations to generate a path to the goal without local minima. We extend the 2D MaxConvNet magnetic field planner to 3D, using a convolutional autoencoder to predict obstacle-aware potential fields from LiDAR-derived 101^3 voxel grids. Evaluation across 100 randomized closed-loop trials in two distinct Cosys-AirSim urban environments, a dense night-time cityscape and a suburban district shows a 100% path planning success rate on both maps without retraining. In offline path planning, 3DMaxConvNet produces path lengths comparable to A* on unseen maps while reducing runtime from 0.155--0.17s to 0.087--0.089s, or about 1.7--1.95 times faster than A*. Against RRT*(3k), 3DMaxConvNet achieves similar path quality while reducing planning runtime from 17.2--17.5s to about 0.09s, which is roughly 193--201 times faster than RRT*(3k).
Split over $n$ resource sharing problem: Are fewer capable agents better than many simpler ones?
Karthik Soma
Mohamed S. Talamali
Genki Miyauchi
Heiko Hamann
Roderich Groß
In multi-agent systems, should limited resources be concentrated into a few capable agents or distributed among many simpler ones? This work… (see more) formulates the split over
Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned
Visual Navigation Models (VNMs) promise generalizable, robot navigation by learning from large-scale visual demonstrations. Despite growing … (see more)real-world deployment, existing evaluations rely almost exclusively on success rate, whether the robot reaches its goal, which conceals trajectory quality, collision behavior, and robustness to environmental change. We present a real-world evaluation of five state-of-the-art VNMs (GNM, ViNT, NoMaD, NaviBridger, and CrossFormer) across two robot platforms and five environments spanning indoor and outdoor settings. Beyond success rate, we combine path-based metrics with vision-based goal-recognition scores and assess robustness through controlled image perturbations (motion blur, sunflare). Our analysis uncovers three systematic limitations: (a) even architecturally sophisticated diffusion and transformer-based models exhibit frequent collisions, indicating limited geometric understanding; (b) models fail to discriminate between different locations that are perceptually similar, however some semantics differences are present, causing goal prediction errors in repetitive environments; and (c) performance degrades under distribution shift. We will publicly release our evaluation codebase and dataset to facilitate reproducible benchmarking of VNMs.
Scalable Multi-Agent Reinforcement Learning Framework for Multi-Machine Tending
Abdalwhab Abdalwhab
David St-Onge
Robotic manipulators hold significant untapped potential for manufacturing industries, particularly when deployed in multi-robot configurati… (see more)ons that can enhance resource utilization, increase throughput, and reduce costs. However, industrial manipulators typically operate in isolated one-robot, one-machine setups, limiting both utilization and scalability. Even mobile robot implementations generally rely on centralized architectures, creating vulnerability to single points of failure and requiring robust communication infrastructure. This paper introduces SMAPPO (Scalable Multi-Agent Proximal Policy Optimization), a scalable input-size invariant multi-agent reinforcement learning model for decentralized multi-robot management in industrial environments. MAPPO (Multi-Agent Proximal Policy Optimization) represents the current state-of-the-art approach. We optimized an existing simulator to handle complex multi-agent reinforcement learning scenarios and designed a new multi-machine tending scenario for evaluation. Our novel observation encoder enables SMAPPO to handle varying numbers of agents, machines, and storage areas with minimal or no retraining. Results demonstrate SMAPPO's superior performance compared to the state-of-the-art MAPPO across multiple conditions: full retraining (up to 61% improvement), curriculum learning (up to 45% increased productivity and up to 49% fewer collisions), zero-shot generalization to significantly different scale scenarios (up to 272% better performance without retraining), and adaptability under extremely low initial training (up to 100% increase in parts delivery).
Sociodynamics of Reinforcement Learning
Reinforcement Learning (RL) has emerged as a core algorithmic paradigm explicitly driving innovation in a growing number of industrial appli… (see more)cations, including large language models and quantitative finance. Furthermore, computational neuroscience has long found evidence of natural forms of RL in biological brains. Therefore, it is crucial for the study of social dynamics to develop a scientific understanding of how RL shapes population behaviors. We leverage the framework of Evolutionary Game Theory (EGT) to provide building blocks and insights toward this objective. We propose a methodology that enables simulating large populations of RL agents in simple game theoretic interaction models. More specifically, we derive fast and parallelizable implementations of two fundamental revision protocols from multi-agent RL - Policy Gradient (PG) and Opponent-Learning Awareness (LOLA) - tailored for population simulations of random pairwise interactions in stateless normal-form games. Our methodology enables us to simulate large populations of 200,000 independent co-learning agents, yielding compelling insights into how non-stationarity-aware learners affect social dynamics. In particular, we find that LOLA learners promote cooperation in the Stag Hunt model, delay cooperative outcomes in the Hawk-Dove model, and reduce strategy diversity in the Rock-Paper-Scissors model.
Swarm robotics localization: comparing methods from infrared to foundation models
Ali Imran
Vivek Shankar Vardharajan
Rafael Gomes Braga
David St-Onge
E-RGB-D: Real-Time Event-Based Perception with Structured Light
Seyed Ehsan Marjani Bajestani
Event-based cameras (ECs) have emerged as bio-inspired sensors that report pixel brightness changes asynchronously, offering unmatched speed… (see more) and efficiency in vision sensing. Despite their high dynamic range, temporal resolution, low power consumption, and computational simplicity, traditional monochrome ECs face limitations in detecting static or slowly moving objects and lack color information essential for certain applications. To address these challenges, we present a novel approach that integrates a Digital Light Processing (DLP) projector, forming Active Structured Light (ASL) for RGB-D sensing. By combining the benefits of ECs and projection-based techniques, our method enables the detection of color and the depth of each pixel separately. Dynamic projection adjustments optimize bandwidth, ensuring selective color data acquisition and yielding colorful point clouds without sacrificing spatial resolution. This integration, facilitated by a commercial TI LightCrafter 4500 projector and a monocular monochrome EC, not only enables frameless RGB-D sensing applications but also achieves remarkable performance milestones. With our approach, we achieved a color detection speed equivalent to 1400 fps and 4 kHz of pixel depth detection, significantly advancing the realm of computer vision across diverse fields from robotics to 3D reconstruction methods. Our code is publicly available: https://github.com/MISTLab/event_based_rgbd_ros
Revisiting the Learning Objectives of Vision-Language Reward Models
Simon Roy
Samuel Barbeau
Christian Desrosiers
Learning generalizable reward functions is a core challenge in embodied intelligence. Recent work leverages contrastive vision language mode… (see more)ls (VLMs) to obtain dense, domain-agnostic rewards without human supervision. These methods adapt VLMs into reward models through increasingly complex learning objectives, yet meaningful comparison remains difficult due to differences in training data, architectures, and evaluation settings. In this work, we isolate the impact of the learning objective by evaluating recent VLM-based reward models under a unified framework with identical backbones, finetuning data, and evaluation environments. Using Meta-World tasks, we assess modeling accuracy by measuring consistency with ground truth reward and correlation with expert progress. Remarkably, we show that a simple triplet loss outperforms state-of-the-art methods, suggesting that much of the improvements in recent approaches could be attributed to differences in data and architectures.
Neural Incremental Dynamic Inversion Control of a Multirotor Robotic Airship
Ely Carneiro de Paiva
José Raul Azinheira
Rafael de Angelis Cordeiro
José Reginaldo H. Carvalho
Apolo Marton