Simon Roy

Master's Research - Université de Montréal

Supervisor

Giovanni Beltrame

Co-supervisor

Jana Pavlasek

Research Topics

Autonomous Robotics Navigation

Multimodal Learning

Reinforcement Learning

Robotics

Website

GitHub

Publications

To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble

Haechan Mark Bong

Simon Roy

Euhid Aman

Giovanni Beltrame

As robot fleets become more heterogeneous, including humanoids, rovers, quadrupeds, and drones, selecting the right robot for a task becomes… (see more) a core systems problem. We study robot skill prediction: mapping a natural-language task description to the physical capabilities required to execute it, such as fly, wheels, legs, surface water, under water and hands. Since labelled data that maps natural-language task descriptions to robot's physical capabilities does not exist, we construct a synthetic task-to-skill dataset using LLM-assisted generation and targeted label auditing. Trained on this data, a ~133M-parameter ensemble of two fine-tuned sentence encoders (mpnet + MiniLM) reaches 83.5% task-to-skill matching on a stratified 200 task dataset, outperforming Kimi K2 (1T MoE) at 72.0%, GPT-OSS-120B at 71.5%, and Llama-4-Scout-17B at 69.0% under the same zero-shot prompt. These results suggest that, for fixed robot skill taxonomies, small specialized models trained on synthetic data can outperform much larger general-purpose LLMs for fleet-level task routing.

2026-05-19

arXiv (preprint)

doi.org

arxiv.org

Revisiting the Learning Objectives of Vision-Language Reward Models

Simon Roy

Samuel Barbeau

Giovanni Beltrame

Christian Desrosiers

Nicolas Thome

Learning generalizable reward functions is a core challenge in embodied intelligence. Recent work leverages contrastive vision language mode… (see more)ls (VLMs) to obtain dense, domain-agnostic rewards without human supervision. These methods adapt VLMs into reward models through increasingly complex learning objectives, yet meaningful comparison remains difficult due to differences in training data, architectures, and evaluation settings. In this work, we isolate the impact of the learning objective by evaluating recent VLM-based reward models under a unified framework with identical backbones, finetuning data, and evaluation environments. Using Meta-World tasks, we assess modeling accuracy by measuring consistency with ground truth reward and correlation with expert progress. Remarkably, we show that a simple triplet loss outperforms state-of-the-art methods, suggesting that much of the improvements in recent approaches could be attributed to differences in data and architectures.

2025-12-19

ArXiv (preprint)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Simon Roy

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Simon Roy

Publications