David Meger

In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentang… (voir plus)led World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the environment dynamics and produce imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.

2023-09-08

ArXiv (prépublication)

Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models Using Pairwise-Distance Estimators

Lucas Berry

This work introduces an efficient novel approach for epistemic uncertainty estimation for ensemble models for regression tasks using pairwis… (voir plus)e-distance estimators (PaiDEs). Utilizing the pairwise-distance between model components, these estimators establish bounds on entropy. We leverage this capability to enhance the performance of Bayesian Active Learning by Disagreement (BALD). Notably, unlike sample-based Monte Carlo estimators, PaiDEs exhibit a remarkable capability to estimate epistemic uncertainty at speeds up to 100 times faster while covering a significantly larger number of inputs at once and demonstrating superior performance in higher dimensions. To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data,

2023-08-25

ArXiv (prépublication)

Hypernetworks for Zero-shot Transfer in Reinforcement Learning

Sahand Rezaei-Shoshtari

Charlotte Morissette

Francois Hogan

Gregory Dudek

In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objec… (voir plus)tive and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.

2023-06-26

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence

Dmitriy Rivkin

Gregory Dudek

Nikhil Kakodkar

Oliver Limoyo

Xue (Steve) Liu

Francois Hogan

Our work examines the way in which large language models can be used for robotic planning and sampling in the context of automated photograp… (voir plus)hic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level description of an event we use an LM to generate a natural-language list of photo descriptions that one would expect a photographer to capture at the event. We then use a VLM to identify the best matches to these descriptions in the robot's video stream. The photo portfolios generated by our method are consistently rated as more appropriate to the event by human evaluators than those generated by existing methods.

2023-06-02

2023 IEEE International Conference on Robotics and Automation (ICRA) (publié)

Normalizing Flow Ensembles for Rich Aleatoric and Epistemic Uncertainty Modeling

Lucas Berry

2023-02-02

ArXiv (prépublication)

NeurIPS 2022 Competition: Driving SMARTS

Amir Hossein Rasouli

R. Goebel

Matthew E. Taylor

Iuliia Kotseruba

Soheil Alizadeh

Tianpei Yang

Montgomery Alban

Florian Shkurti

Yuzheng Zhuang

Adam Ścibior

Kasra Rezaee

Animesh Garg

Jun Luo

Liam Paull

Weinan Zhang

Xinyu Wang

Xiangshan Chen

2022-11-14

ArXiv (prépublication)

Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

Harley Wiltzer

Marc Gendron-Bellemare

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (publié)

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

Scott Fujimoto

Doina Precup

Ofir Nachum

Shixiang Shane Gu

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is… (voir plus) uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (publié)

proceedings.mlr.press

Adaptive Confidence Calibration

Jonathan W. Pearce

2022-05-27

Applied Informatics (publié)

IL-flOw: Imitation Learning from Observation using Normalizing Flows

Wei-Di Chang

Juan Higuera

Scott Fujimoto

Gregory Dudek

2022-05-19

ArXiv (prépublication)

Continuous MDP Homomorphisms and Homomorphic Policy Gradient

Sahand Rezaei-Shoshtari

Learning Assisted Identification of Scenarios Where Network Optimization Algorithms Under-Perform

Dmitriy Rivkin

X. T. Chen

We present a generative adversarial method that uses deep learning to identify network load traffic conditions in which network optimization… (voir plus) algorithms under-perform other known algorithms: the Deep Convolutional Failure Generator (DCFG). The spatial distribution of network load presents challenges for network operators for tasks such as load balancing, in which a network optimizer attempts to maintain high quality communication while at the same time abiding capacity constraints. Testing a network optimizer for all possible load distributions is challenging if not impossible. We propose a novel method that searches for load situations where a target network optimization method underperforms baseline, which are key test cases that can be used for future refinement and performance optimization. By modeling a realistic network simulator's quality assessments with a deep network and, in parallel, optimizing a load generation network, our method efficiently searches the high dimensional space of load patterns and reliably finds cases in which a target network optimization method under-performs a baseline by a significant margin.

2021-12-01

Global Communications Conference (publié)