Joshua Romoff

CA2: Code-Aware Agent for Automated Game Testing

Vincent Martineau

Automated game testing is important for verifying game functionality, but it remains a costly and time-consuming process. Manual testing oft… (see more)en misses edge cases, and current automated methods struggle to provide full code coverage. Prior work has explored reinforcement learning (RL) for game testing, but without leveraging internal code signals such as the call stack. We present Code Aware Agent (CA2), which uses call stack information to learn effective testing strategies. The agent receives the current function call trace along with the game state and learns to reach specific target functions. We instrument two types of environments, 1) State-based and 2) Image-based, with support for efficient call stack extraction. Through experimental evaluation, we find that CA2 achieves consistent improvement over the non-code aware baselines, which does not leverage call stack information. Our results show that incorporating code signals like the call stack enables more effective and targeted game testing.

2026-05-12

arXiv (preprint)

doi.org

arxiv.org

Improving Intrinsic Exploration by Creating Stationary Objectives

Roger Creus Castanyer

Joshua Romoff

Glen Berseth

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play

Gabriel Robert

Recent advances in Competitive Self-Play (CSP) have achieved, or even surpassed, human level performance in complex game environments such a… (see more)s Dota 2 and StarCraft II using Distributed Multi-Agent Reinforcement Learning (MARL). One core component of these methods relies on creating a pool of learning agents -- consisting of the Main Agent, past versions of this agent, and Exploiter Agents -- where Exploiter Agents learn counter-strategies to the Main Agents. A key drawback of these approaches is the large computational cost and physical time that is required to train the system, making them impractical to deploy in highly iterative real-life settings such as video game productions. In this paper, we propose the Minimax Exploiter, a game theoretic approach to exploiting Main Agents that leverages knowledge of its opponents, leading to significant increases in data efficiency. We validate our approach in a diversity of settings, including simple turn based games, the arcade learning environment, and For Honor, a modern video game. The Minimax Exploiter consistently outperforms strong baselines, demonstrating improved stability and data efficiency, leading to a robust CSP-MARL method that is both flexible and easy to deploy.

2023-12-31

AAMAS (published)

doi.org

arxiv.org

Direct Behavior Specification via Constrained Reinforcement Learning

Christopher Pal

The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most oft… (see more)en, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied RL projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods to automatically weigh each of these behavioral constraints. Specifically, we investigate how CMDPs can be adapted to solve goal-based tasks while adhering to several constraints simultaneously. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Randomized Value Functions via Multiplicative Normalizing Flows

Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimen… (see more)sional state and action spaces. Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values. This prevents the agent's behavior policy from prematurely exploiting early estimates and falling into local optima. In this work, we leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) to achieve randomized value functions for high-dimensional domains. In particular, we augment DQN and DDPG with multiplicative normalizing flows in order to track a rich approximate posterior distribution over the parameters of the value function. This allows the agent to perform approximate Thompson sampling in a computationally efficient manner via stochastic gradient methods. We demonstrate the benefits of our approach through an empirical comparison in high dimensional environments.

2020-08-05

Conference on Uncertainty in Artificial Intelligence (published)

doi.org

proceedings.mlr.press

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Joshua Romoff

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Joshua Romoff

Publications