Portrait of Haque Ishfaq

Haque Ishfaq

Collaborating Alumni - McGill University
Supervisor
Research Topics
Online Learning
Reinforcement Learning

Publications

Langevin Soft Actor-Critic: Efficient Exploration Through Uncertainty-Driven Critic Learning
Existing actor-critic algorithms, which are popular for continuous control reinforcement learning (RL) tasks, suffer from poor sample effici… (see more)ency due to lack of principled exploration mechanism within them. Motivated by the success of Thompson sampling for efficient exploration in RL, we propose a novel model-free RL algorithm, Langevin Soft Actor Critic (LSAC), which prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC employs three key innovations: approximate Thompson sampling through distributional Langevin Monte Carlo (LMC) based
More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling
Yixin Tan
Yu Yang
Qingfeng Lan
Jianfeng Lu
A. Rupam Mahmood
Pan Xu
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Qingfeng Lan
Pan Xu
A. Rupam Mahmood
Anima Anandkumar
Kamyar Azizzadenesheli
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcom… (see more)ings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of
Offline Multitask Representation Learning for Reinforcement Learning
Raman Arora
Songtao Feng
Thanh Nguyen-Tang
Mengdi Wang
Ming Yin
We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from … (see more)different tasks that share a common representation and is asked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore, we examine downstream RL in reward-free, offline and online scenarios, where a new task is introduced to the agent that shares the same representation as the upstream offline tasks. Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
Randomized Exploration for Reinforcement Learning with General Value Function Approximation
Qiwen Cui
Viet Nguyen
Alex Ayoub
Zhuoran Yang
Zhaoran Wang
Lin F. Yang
Randomized Least Squares Policy Optimization
Zhuoran Yang
Viet Bang Nguyen
Lewis Liu
Zhaoran Wang
Policy Optimization (PO) methods with function approximation are one of the most popular classes of Reinforcement Learning (RL) algorithms. … (see more)However, designing provably efficient policy optimization algorithms remains a challenge. Recent work in this area has focused on incorporating upper confidence bound (UCB)-style bonuses to drive exploration in policy optimization. In this paper, we present Randomized Least Squares Policy Optimization (RLSPO) which is inspired by Thompson Sampling. We prove that, in an episodic linear kernel MDP setting, RLSPO achieves (cid:101) O ( d 3 / 2 H 3 / 2 √ T ) worst-case (frequentist) regret, where H is the number of episodes, T is the total number of steps and d is the feature dimension. Finally, we evaluate RLSPO empirically and show that it is competitive with existing provably efficient PO algorithms.