Portrait of Ivan Anokhin is unavailable

Ivan Anokhin

PhD - Université de Montréal
Supervisor
Co-supervisor
Research Topics
Reinforcement Learning

Publications

AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models
Handling Delay in Real-Time Reinforcement Learning
Rishav
Matthew D Riemer
Stephen Chung
Real-time reinforcement learning (RL) introduces several challenges. First, policies are constrained to a fixed number of actions per second… (see more) due to hardware limitations. Second, the environment may change while the network is still computing an action, leading to observational delay. The first issue can partly be addressed with pipelining, leading to higher throughput and potentially better policies. However, the second issue remains: if each neuron operates in parallel with an execution time of
Handling Delay in Real-Time Reinforcement Learning
Rishav
Matthew D Riemer
Stephen Chung
Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons
Biological neural networks operate in parallel, a feature that sets them apart from artificial neural networks and can significantly enhance… (see more) inference speed. However, this parallelism introduces challenges: when each neuron operates asynchronously with a fixed execution time, an
Thinker: Learning to Plan and Act
We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a le… (see more)arned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for handcrafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. Thinker is the first work showing that an RL agent can learn to plan with a learned world model in complex environments.