Portrait de Martin Klissarov n'est pas disponible

Martin Klissarov

Doctorat - McGill University
Superviseur⋅e principal⋅e


Code as Reward: Empowering Reinforcement Learning with VLMs
David Venuto
Mohammad Sami Nur Islam
Martin Klissarov
Sherry Yang
Ankit Anand
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov
Pierluca D'Oro
Shagun Sodhani
Roberta Raileanu
Amy Zhang
Mikael Henaff
Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, … (voir plus)a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. Motif is based on the idea of grounding LLMs for decision-making without requiring them to interact with the environment: it elicits preferences from an LLM over pairs of captions to construct an intrinsic reward, which is then used to train agents with reinforcement learning. We evaluate Motif's performance and behavior on the challenging, open-ended and procedurally-generated NetHack game. Surprisingly, by only learning to maximize its intrinsic reward, Motif achieves a higher game score than an algorithm directly trained to maximize the score itself. When combining Motif's intrinsic reward with the environment reward, our method significantly outperforms existing approaches and makes progress on tasks where no advancements have ever been made without demonstrations. Finally, we show that Motif mostly generates intuitive human-aligned behaviors which can be steered easily through prompt modifications, while scaling well with the LLM size and the amount of information given in the prompt.
Options of Interest: Temporal Abstraction with Interest Functions
Martin Klissarov
Maxime Chevalier-Boisvert
Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. Th… (voir plus)e options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.