Portrait of Paul Barde

Paul Barde

Collaborating researcher - McGill University
Co-supervisor
Research Topics
Reinforcement Learning

Publications

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem
Jakob Nicolaus Foerster
Amy Zhang
From Words to Blocks: Building Objects by Grounding Language Models with Reinforcement Learning
Michael Ahn
Anthony Brohan
Noah Brown
liang Dai
Dan Su
Holy Lovenia Ziwei Bryan Wilie
Tiezheng Yu
Willy Chung
Quyet V. Do
Tristan Karch
C. Bonial
Mitchell Abrams
David R. Traum
Hyung Won
Le Hou
Shayne Longpre
Yi Zoph
William Tay … (see 32 more)
Eric Fedus
Xuezhi Li
Lasse Espeholt
Hubert Soyer
Remi Munos
Karen Si-801
Vlad Mnih
Tom Ward
Yotam Doron
Wenlong Huang
Pieter Abbeel
Deepak Pathak
Julia Kiseleva
Ziming Li
Mohammad Aliannejadi
Shrestha Mohanty
Maartje Ter Hoeve
Mikhail Burtsev
Alexey Skrynnik
A. Panov
Kavya Srinet
A. Szlam
Yuxuan Sun
Katja Hofmann
Marc-Alexandre Côté
Ahmed Hamid Awadallah
Linar Abdrazakov
Igor Churin
Putra Manggala
Kata Naszádi
Michiel Van Der Meer
Leveraging pre-trained language models to gen-001 erate action plans for embodied agents is an 002 emerging research direction. However, exe… (see more)-003 cuting instructions in real or simulated envi-004 ronments necessitates verifying the feasibility 005 of actions and their relevance in achieving a 006 goal. We introduce a novel method that in-007 tegrates a language model and reinforcement 008 learning for constructing objects in a Minecraft-009 like environment, based on natural language 010 instructions. Our method generates a set of 011 consistently achievable sub-goals derived from 012 the instructions and subsequently completes the 013 associated sub-tasks using a pre-trained RL pol-014 icy. We employ the IGLU competition, which 015 is based on the Minecraft-like simulator, as our 016 test environment, and compare our approach 017 to the competition’s top-performing solutions. 018 Our approach outperforms existing solutions in 019 terms of both the quality of the language model 020 and the quality of the structures built within the 021 IGLU environment. 022
Learning to Guide and to Be Guided in the Architect-Builder Problem
Tristan Karch
Clément Moulin-Frier
Pierre-Yves Oudeyer
We are interested in interactive agents that learn to coordinate, namely, a …
Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization
Adversarial imitation learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones … (see more)-- and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning. We propose to remove the burden of the policy optimization steps by leveraging a novel discriminator formulation. Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy. When optimized, this discriminator directly learns the optimal generator's policy. Consequently, our discriminator's update solves the generator's optimization problem for free: learning a policy that imitates the expert does not require an additional optimization loop. This formulation effectively cuts by half the implementation and computational burden of adversarial imitation learning algorithms by removing the reinforcement learning phase altogether. We show on a variety of tasks that our simpler approach is competitive to prevalent imitation learning methods.