Sobhan Mohammadpour

Decoupling regularization from the action space

Sobhan Mohammadpour

Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. W… (voir plus)hile standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a dynamic counterpart, universally applicable where this problem arises. Implementing these changes improves performance on the DeepMind control suite in static and dynamic temperature regimes and a biological design task.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Maximum entropy GFlowNets with soft Q-learning

Sobhan Mohammadpour

Emmanuel Bengio

Emma Frejinger

Pierre-Luc Bacon

Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a sc… (voir plus)alable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. This construction allows us to introduce maximum entropy GFNs, which, in contrast to GFNs with uniform backward policy, achieve the maximum entropy attainable by GFNs without constraints on the state space.

2023-12-31

AISTATS (publié)

doi.org

proceedings.mlr.press

Arc travel time and path choice model estimation subsumed

Sobhan Mohammadpour

Emma Frejinger

We propose a method for maximum likelihood estimation of path choice model parameters and arc travel time using data of diﬀerent levels of… (voir plus) granularity. Hitherto these two tasks have been tackled separately under strong assumptions. Using a small example, we illustrate that this can lead to biased results. Results on both real (New York yellow cab) and simulated data show strong performance of our method compared to existing baselines. models and loss functions. It is designed to estimate arc travel time and path choice model parameters simultaneously. We showed that by marginalizing the unobserved variables and using stochastic gradient estimates, we obtain a maximum likelihood estimation even for observations at diﬀerent level of granularity. We showed that we can mix diﬀerent data type when computing the MLE without needing to use a linear combination of losses as

2022-10-24

ArXiv (prépublication)

doi.org

arxiv.org

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Sobhan Mohammadpour

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Sobhan Mohammadpour

Publications