Elham Daneshmand

PhD - McGill University

Supervisor

Hsiu-Chin Lin

Co-supervisor

Glen Berseth

Research Topics

Reinforcement Learning

Publications

SLowRL: Safe Low-Rank Adaptation for Bridging the Sim-to-Real Gap in Legged Locomotion

Elham Daneshmand

Shafeef Omar

Glen Berseth

Majid Khadiv

Hsiu-Chin Lin

A simulator is, at best, a coarse low-fidelity model of the real world the agent eventually has to act in. Closing this residual gap on hard… (see more)ware is a canonical instance of operating in a big world: the real environment exposes contact dynamics, latencies, and disturbances that the agent was never given the capacity (parameters or data) to model during pretraining. Naive on-hardware fine-tuning is risky --- the policy can damage the robot before it improves --- and full-parameter updates require prohibitive interaction time. We propose SLowRL, a continual fine-tuning framework that confronts this big-world adaptation problem with two complementary forms of capacity limitation: (i) a rank-1 LoRA adapter applied per layer to both actor and critic, restricting each layer's update to a single direction in its image space (

2026-06-09

rl-conference.cc/RLC/2026/Workshop/RL_in_Big_Worlds (poster)

openreview.net

SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion

Elham Daneshmand

Shafeef Omar

Glen Berseth

Majid Khadiv

Hsiu-Chin Lin

Sim-to-real transfer of locomotion policies often leads to performance degradation due to the inevitable sim-to-real gap. Naively fine-tunin… (see more)g these policies directly on hardware is problematic, as it poses risks of mechanical failure and suffers from high sample inefficiency. In this paper, we address the challenge of safely and efficiently fine-tuning reinforcement learning (RL) policies for dynamic locomotion tasks. Specifically, we focus on fine-tuning policies learned in simulation directly on hardware, while explicitly enforcing safety constraints. In doing so, we introduce SLowRL, a framework that combines Low-Rank Adaptation (LoRA) with training-time safety enforcement via a recovery policy. We evaluate our method both in simulation and on a real Unitree Go2 quadruped robot for jump and trot tasks. Experimental results show that our method achieves a

2026-03-16

arXiv (preprint)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Elham Daneshmand

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Elham Daneshmand

Publications