Portrait de Sobhan Mohammadpour  n'est pas disponible

Sobhan Mohammadpour

Alumni

Publications

Decoupling regularization from the action space
Sobhan Mohammadpour
Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. W… (voir plus)hile standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a dynamic counterpart, universally applicable where this problem arises. Implementing these changes improves performance on the DeepMind control suite in static and dynamic temperature regimes and a biological design task.
Maximum entropy GFlowNets with soft Q-learning
Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a sc… (voir plus)alable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. This construction allows us to introduce maximum entropy GFNs, which, in contrast to GFNs with uniform backward policy, achieve the maximum entropy attainable by GFNs without constraints on the state space.
Arc travel time and path choice model estimation subsumed
Sobhan Mohammadpour
We propose a method for maximum likelihood estimation of path choice model parameters and arc travel time using data of different levels of… (voir plus) granularity. Hitherto these two tasks have been tackled separately under strong assumptions. Using a small example, we illustrate that this can lead to biased results. Results on both real (New York yellow cab) and simulated data show strong performance of our method compared to existing baselines. models and loss functions. It is designed to estimate arc travel time and path choice model parameters simultaneously. We showed that by marginalizing the unobserved variables and using stochastic gradient estimates, we obtain a maximum likelihood estimation even for observations at different level of granularity. We showed that we can mix different data type when computing the MLE without needing to use a linear combination of losses as