Portrait of Janarthanan Rajendran

Janarthanan Rajendran

Affiliate Member
Assistant Professor, Dalhousie University
Research Topics
Deep Learning
Natural Language Processing
Reinforcement Learning

Biography

Janarthanan Rajendran is an Assistant Professor and the Sexton Chair in Reinforcement Learning in the Faculty of Computer Science at Dalhousie University. He is also an Affiliate Member of Mila and a Principal Member of the Atlantic Canada Research Consortium. His research interests lie in building AI systems that–through interaction–can learn to be competent in complex, dynamic, and uncertain environments. He is interested in computational methods that build such systems as well as their practical applications and societal implications. To this end, his current research primarily focuses on deep reinforcement learning. Prior to his current position, he was an IVADO postdoctoral research fellow at Mila and the University of Montréal, working with Prof. Sarath Chandar and Prof. Doina Precup. He completed his Ph.D. at the University of Michigan, Ann Arbor, under the supervision of Prof. Satinder Singh, and his Master’s and undergraduate degrees at the Indian Institute of Technology Madras (IITM) under the supervision of Prof. Balaraman Ravindran and Prof. Kaushik Mitra. Janarthanan’s research in machine learning has been published in top venues such as NeurIPS, ICLR, and ICML.

Publications

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning
Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration met… (see more)hod that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at \textit{each environment timestep}, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments each agent's state-action value estimate with an action-conditioned optimistic bonus derived from the visitation count of the global state and joint actions of preceding agents. COE is performed during training and disabled at deployment, making it compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.
Behavioral Cloning for Crystal Design
Santiago Miret
Mariano Phielipp
A. Chandar
Solid-state materials, which are made up of periodic 3D crystal structures, are particularly useful for a variety of real-world applications… (see more) such as batteries, fuel cells and catalytic materials. Designing solid-state materials, especially in a robust and automated fashion, remains an ongoing challenge. To further the automated design of crystalline materials, we propose a method to learn to design valid crystal structures given a crystal skeleton. By incorporating Euclidean equivariance into a policy network, we portray the problem of designing new crystals as a sequential prediction task suited for imitation learning. At each step, given an incomplete graph of a crystal skeleton, an agent assigns an element to a specific node. We adopt a behavioral cloning strategy to train the policy network on data consisting of curated trajectories generated from known crystals.
Replay Buffer with Local Forgetting for Adapting to Local Environment Changes in Deep Model-Based Reinforcement Learning
One of the key behavioral characteristics used in neuroscience to determine whether the subject of study -- be it a rodent or a human -- exh… (see more)ibits model-based learning is effective adaptation to local changes in the environment, a particular form of adaptivity that is the focus of this work. In reinforcement learning, however, recent work has shown that modern deep model-based reinforcement-learning (MBRL) methods adapt poorly to local environment changes. An explanation for this mismatch is that MBRL methods are typically designed with sample-efficiency on a single task in mind and the requirements for effective adaptation are substantially higher, both in terms of the learned world model and the planning routine. One particularly challenging requirement is that the learned world model has to be sufficiently accurate throughout relevant parts of the state-space. This is challenging for deep-learning-based world models due to catastrophic forgetting. And while a replay buffer can mitigate the effects of catastrophic forgetting, the traditional first-in-first-out replay buffer precludes effective adaptation due to maintaining stale data. In this work, we show that a conceptually simple variation of this traditional replay buffer is able to overcome this limitation. By removing only samples from the buffer from the local neighbourhood of the newly observed samples, deep world models can be built that maintain their accuracy across the state-space, while also being able to effectively adapt to local changes in the reward function. We demonstrate this by applying our replay-buffer variation to a deep version of the classical Dyna method, as well as to recent methods such as PlaNet and DreamerV2, demonstrating that deep model-based methods can adapt effectively as well to local changes in the environment.
PatchBlender: A Motion Prior for Video Transformers
Yale Song
R Devon Hjelm
Neel Joshi
A. Chandar
Staged independent learning: Towards decentralized cooperative multi-agent Reinforcement Learning
We empirically show that classic ideas from two-time scale stochastic approximation \citep{borkar1997stochastic} can be combined with sequen… (see more)tial iterative best response (SIBR) to solve complex cooperative multi-agent reinforcement learning (MARL) problems. We first start with giving a multi-agent estimation problem as a motivating example where SIBR converges while parallel iterative best response (PIBR) does not. Then we present a general implementation of staged multi-agent RL algorithms based on SIBR and multi-time scale stochastic approximation, and show that our new methods which we call Staged Independent Proximal Policy Optimization (SIPPO) and Staged Independent Q-learning (SIQL) outperform state-of-the-art independent learning on almost all the tasks in the epymarl \citep{papoudakis2020benchmarking} benchmark. This can be seen as a first step towards more decentralized MARL methods based on SIBR and multi-time scale learning.
Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods
In recent years, a growing number of deep model-based reinforcement learning (RL) methods have been introduced. The interest in deep model-b… (see more)ased RL is not surprising, given its many potential benefits, such as higher sample efficiency and the potential for fast adaption to changes in the environment. However, we demonstrate, using an improved version of the recently introduced Local Change Adaptation (LoCA) setup, that well-known model-based methods such as PlaNet and DreamerV2 perform poorly in their ability to adapt to local environmental changes. Combined with prior work that made a similar observation about the other popular model-based method, MuZero, a trend appears to emerge, suggesting that current deep model-based methods have serious limitations. We dive deeper into the causes of this poor performance, by identifying elements that hurt adaptive behavior and linking these to underlying techniques frequently used in deep model-based RL. We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna.