Haque Ishfaq

Offline Multitask Representation Learning for Reinforcement Learning

Thanh Nguyen-Tang

Songtao Feng

Raman Arora

Mengdi Wang

Ming Yin

2024-09-25

NeurIPS.cc/2024/Conference (poster)

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

Yixin Tan

Yu Yang

Qingfeng Lan

Jianfeng Lu

A. Rupam Mahmood

Pan Xu

2024-05-14

rl-conference.cc/RLC/2024/Conference (published)

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Qingfeng Lan

Pan Xu

A. Rupam Mahmood

Animashree Anandkumar

Kamyar Azizzadenesheli

We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcom… (see more)ings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of

2024-01-16

ICLR.cc/2024/Conference (poster)

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

Qiwen Cui

Viet Huy Nguyen

Alex Ayoub

Zhuoran Yang

Zhaoran Wang

Lin F. Yang

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm … (see more)as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class

2021-06-15

ArXiv (preprint)

arxiv.org

Randomized Exploration in Reinforcement Learning with General Value Function Approximation

Qiwen Cui

Viet Bang Nguyen

Alex Ayoub

Zhuoran Yang

Zhaoran Wang

Lin Yang

2021-01-01

International Conference on Machine Learning (published)

proceedings.mlr.press

Randomized Least Squares Policy Optimization

Zhuoran Yang

Andrei-Stefan Lupu

Viet Bang Nguyen

Lewis Liu

Riashat Islam

Zhaoran Wang