Maxime Gasse

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin

Maxime Gasse

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

Quentin Cappart

David Vazquez

Nicolas Chapados

Alexandre Lacoste

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin

Maxime Gasse

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

Quentin Cappart

David Vazquez

Nicolas Chapados

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 29 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-12

ArXiv (preprint)

doi.org

arxiv.org

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin

Maxime Gasse

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

Quentin Cappart

David Vazquez

Nicolas Chapados

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 29 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

doi.org

openreview.net

Pruning Sparse Tensor Neural Networks Enables Deep Learning for 3D Ultrasound Localization Microscopy

Brice Rauby

Paul Xing

Jonathan Por'ee

Maxime Gasse

Jean Provost

2024-02-14

ArXiv (preprint)

doi.org

arxiv.org

The Unsolved Challenges of LLMs as Generalist Web Agents: A Case Study

Rim Assouel

Tom Marty

Massimo Caccia

Issam Hadj Laradji

Alexandre Drouin

Sai Rajeswar

Hector Palacios

Quentin Cappart

David Vazquez

Nicolas Chapados

Maxime Gasse

Alexandre Lacoste

2023-11-07

NeurIPS.cc/2023/Workshop/FMDM (published)

openreview.net

Using Confounded Data in Latent Model-Based Reinforcement Learning

Maxime Gasse

Damien GRASSET

Guillaume Gaudron

Pierre-Yves Oudeyer

2023-08-14

TMLR (accepted)

openreview.net

Lookback for Learning to Branch

Prateek Gupta

Elias Boutros Khalil

Didier Chételat

Maxime Gasse

Yoshua Bengio

Andrea Lodi

M. Pawan Kumar

2022-10-10

TMLR (accepted)

doi.org

openreview.net

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

Maxime Gasse

Simon Bowly

Quentin Cappart

Jonas Charfreitag

Laurent Charlin

Didier Chételat

Antonia Chmiela

Justin Dumouchelle

Ambros Gleixner

Aleksandr Kazachkov

Elias Boutros Khalil

Paweł Lichocki

Andrea Lodi

Miles Lubin

Chris J. Maddison

Christopher Morris

D. Papageorgiou

Augustin Parjadis

Sebastian Pokutta

Antoine Prouvost … (see 22 more)

Lara Scavuzzo

Giulia Zarpellon

Linxin Yangm

Sha Lai

Akang Wang

Xiaodong Luo

Xiang Zhou

Haohan Huang

Sheng Cheng Shao

Yuanming Zhu

Dong Dong Zhang

Tao Manh Quan

Zixuan Cao

Yang Xu

Zhewei Huang

Shuchang Zhou

C. Binbin

He Minggui

Haoren Ren Hao

Zhang Zhiyu

An Zhiwu

Mao Kun

Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused … (see more)on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.

2021-01-01

NeurIPS (Competition and Demos) (published)

doi.org

arxiv.org

On generalized surrogate duality in mixed-integer nonlinear programming

Benjamin Müller

Gonzalo Muñoz

Maxime Gasse

Ambros Gleixner

Andrea Lodi

Felipe Serrano

2020-04-14

Integer Programming and Combinatorial Optimization (published)

doi.org

arxiv.org

On the Effectiveness of Two-Step Learning for Latent-Variable Models

Cem (Yusuf) Subakan

Maxime Gasse

Laurent Charlin

Latent-variable generative models offer a principled solution for modeling and sampling from complex probability distributions. Implementing… (see more) a joint training objective with a complex prior, however, can be a tedious task, as one is typically required to derive and code a specific cost function for each new type of prior distribution. In this work, we propose a general framework for learning latent variable generative models in a two-step fashion. In the first step of the framework, we train an autoencoder, and in the second step we fit a prior model on the resulting latent distribution. This two-step approach offers a convenient alternative to joint training, as it allows for a straightforward combination of existing models without the hustle of deriving new cost functions, and the need for coding the joint training objectives. Through a set of experiments, we demonstrate that two-step learning results in performances similar to joint training, and in some cases even results in more accurate modeling.

2020-01-01

2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (published)

doi.org

On generalized surrogate duality in mixed-integer nonlinear programming

Benjamin Müller

Gonzalo Muñoz

Maxime Gasse

Ambros Gleixner

Andrea Lodi

Felipe Serrano

2019-12-01

ArXiv (preprint)

doi.org

arxiv.org

Exact Combinatorial Optimization with Graph Convolutional Neural Networks

Maxime Gasse

Didier Chételat

Nicola Ferroni

Laurent Charlin

Andrea Lodi

Combinatorial optimization problems are typically tackled by the branch-and-bound paradigm. We propose a new graph convolutional neural netw… (see more)ork model for learning branch-and-bound variable selection policies, which leverages the natural variable-constraint bipartite graph representation of mixed-integer linear programs. We train our model via imitation learning from the strong branching expert rule, and demonstrate on a series of hard problems that our approach produces policies that improve upon state-of-the-art machine-learning methods for branching and generalize to instances significantly larger than seen during training. Moreover, we improve for the first time over expert-designed branching rules implemented in a state-of-the-art solver on large problems. Code for reproducing all the experiments can be found at this https URL.

2019-01-01

NeurIPS (published)

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Maxime Gasse

Biography

Current Students

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Maxime Gasse

Biography

Current Students

Publications