Portrait of Elham Daneshmand

Elham Daneshmand

PhD - McGill University
Supervisor
Co-supervisor
Research Topics
Reinforcement Learning

Publications

SLowRL: Safe Low-Rank Adaptation for Bridging the Sim-to-Real Gap in Legged Locomotion
Shafeef Omar
Majid Khadiv
A simulator is, at best, a coarse low-fidelity model of the real world the agent eventually has to act in. Closing this residual gap on hard… (see more)ware is a canonical instance of operating in a big world: the real environment exposes contact dynamics, latencies, and disturbances that the agent was never given the capacity (parameters or data) to model during pretraining. Naive on-hardware fine-tuning is risky --- the policy can damage the robot before it improves --- and full-parameter updates require prohibitive interaction time. We propose SLowRL, a continual fine-tuning framework that confronts this big-world adaptation problem with two complementary forms of capacity limitation: (i) a rank-1 LoRA adapter applied per layer to both actor and critic, restricting each layer's update to a single direction in its image space (
SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion
Shafeef Omar
Majid Khadiv
Sim-to-real transfer of locomotion policies often leads to performance degradation due to the inevitable sim-to-real gap. Naively fine-tunin… (see more)g these policies directly on hardware is problematic, as it poses risks of mechanical failure and suffers from high sample inefficiency. In this paper, we address the challenge of safely and efficiently fine-tuning reinforcement learning (RL) policies for dynamic locomotion tasks. Specifically, we focus on fine-tuning policies learned in simulation directly on hardware, while explicitly enforcing safety constraints. In doing so, we introduce SLowRL, a framework that combines Low-Rank Adaptation (LoRA) with training-time safety enforcement via a recovery policy. We evaluate our method both in simulation and on a real Unitree Go2 quadruped robot for jump and trot tasks. Experimental results show that our method achieves a