Portrait of Cyrus  Neary

Cyrus Neary

Postdoctorate - Université de Montréal
Supervisor
Co-supervisor
Research Topics
Applied AI
Deep Learning
Dynamical Systems
Formal Methods
Foundation Models
Multi-Agent Systems
Reinforcement Learning
Robotics
Scientific Machine Learning
Trustworthy AI

Publications

RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
Pranav Atreya
Karl Pertsch
Tony Lee
Moo Jin Kim
Arhan Jain
Cyrus Neary
Edward S. Hu
Kanav Arora
Luca Macesanu
Matthew Leonard
Meedeum Cho
Shivin Dass
Tony Wang
Xingfang Yuan
Abhishek Gupta
Dinesh Jayaraman
Kostas Daniilidis
Roberto Martín-Martín
Youngwoon Lee
Percy Liang
Chelsea Finn
Sergey Levine
RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
Pranav Atreya
Karl Pertsch
Tony Lee
Moo Jin Kim
Arhan Jain
Cyrus Neary
Edward S. Hu
Kanav Arora
Luca Macesanu
Matthew Leonard
Meedeum Cho
Shivin Dass
Tony Wang
Xingfang Yuan
Abhishek Gupta
Dinesh Jayaraman
Kostas Daniilidis
Roberto Martín-Martín
Youngwoon Lee
Percy Liang
Chelsea Finn
Sergey Levine
Comprehensive, unbiased, and comparable evaluation of modern generalist policies is uniquely challenging: existing approaches for robot benc… (see more)hmarking typically rely on heavy standardization, either by specifying fixed evaluation tasks and environments, or by hosting centralized "robot challenges", and do not readily scale to evaluating generalist policies across a broad range of tasks and environments. In this work, we propose RoboArena, a new approach for scalable evaluation of generalist robot policies in the real world. Instead of standardizing evaluations around fixed tasks, environments, or locations, we propose to crowd-source evaluations across a distributed network of evaluators. Importantly, evaluators can freely choose the tasks and environments they evaluate on, enabling easy scaling of diversity, but they are required to perform double-blind evaluations over pairs of policies. Then, by aggregating preference feedback from pairwise comparisons across diverse tasks and environments, we can derive a ranking of policies. We instantiate our approach across a network of evaluators at seven academic institutions using the DROID robot platform. Through more than 600 pairwise real-robot evaluation episodes across seven generalist policies, we demonstrate that our crowd-sourced approach can more accurately rank the performance of existing generalist policies than conventional, centralized evaluation approaches, while being more scalable, resilient, and trustworthy. We open our evaluation network to the community and hope that it can enable more accessible comparisons of generalist robot policies.
Task Robustness via Re-Labelling Vision-Action Robot Data
The recent trend in scaling models for robot learning has resulted in impressive policies that can perform various manipulation tasks and ge… (see more)neralize to novel scenarios. However, these policies continue to struggle with following instructions, likely due to the limited linguistic and action sequence diversity in existing robotics datasets. This paper introduces
Task Robustness via Re-Labelling Vision-Action Robot Data
Zero-Shot Constraint Satisfaction with Forward- Backward Representations
Adriana Hugessen
Cyrus Neary
Traditionally, constrained policy optimization with Reinforcement Learning (RL) requires learning a new policy from scratch for any new envi… (see more)ronment, goal or cost function, with limited generalization to new tasks and constraints. Given the sample inefficiency of many common deep RL methods, this procedure can be impractical for many real-world scenarios, particularly when constraints or tasks are changing. As an alternative, in the unconstrained setting, various works have sought to pre-train representations from offline datasets to accelerate policy optimization upon specification of a reward. Such methods can permit faster adaptation to new tasks in a given environment, dramatically improving sample efficiency. Recently, zero-shot policy optimization has been explored by leveraging a particular
Scalable Tree Search over Graphs with Learned Action Pruning for Power Grid Control
As real-world infrastructure systems become increasingly complex and large-scale, there is a growing need for learning-based control strateg… (see more)ies that can make informed decisions in complex and dynamic environments. However, large-scale problems — such as power grid control — introduce high-dimensional action spaces and necessitate transferability across varying grid topologies. We introduce **H**ierarchical **E**xpert-Guided **R**econfiguration **O**ptimization for **G**raph **T**opologies, **HERO-GT**, a model-based planning approach that combines a pretrained graph neural network (GNN) for topology-aware action pruning with a Monte Carlo Tree Search (MCTS) planner for targeted, structured exploration. More specifically, the high-level GNN predicts a promising subset of actions, which the low-level MCTS agent uses to focus its search and reduce computational overhead while remaining adaptable to unseen graph structures. Furthermore, the MCTS planner leverages a given *default policy*---which may be defined, for example, by heuristics, problem relaxations, or rule-based methods---to bias the search and prioritize actions that are expected to improve performance over the default. We deploy HERO-GT in power grid environments, demonstrating that it not only improves over a strong default policy, but also scales to a realistic operational setting where exhaustive search becomes computationally infeasible.