Publications

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin

Maxime Gasse

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

Quentin Cappart

David Vazquez

Nicolas Chapados

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 29 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Machel Reid

Nikolay Savinov

Denis Teplyashin

Dmitry Lepikhin

Timothy P. Lillicrap

Jean-Baptiste Alayrac

Radu Soricut

Angeliki Lazaridou

Orhan Firat

Julian Schrittwieser

Ioannis Antonoglou

Rohan Anil

Sebastian Borgeaud

Andrew M. Dai

Katie Millican

Ethan Dyer

Mia Glaese

Thibault Sottiaux

Benjamin Lee

Fabio Viola … (see 479 more)

Malcolm Reynolds

Yuanzhong Xu

James L. Molloy

Jilin Chen

Michael Acheson Isard

Paul R. Barham

Tom Hennigan

Ross McIlroy

Melvin Johnson

J. Schalkwyk

Eli Collins

Eliza Rutherford

Erica Moreira

Kareem W. Ayoub

Megha Goel

Clemens Meyer

Gregory Thornton

Zhen Yang

Henryk Michalewski

Zaheer Abbas

Nathan Schucher

Ankesh Anand

Richard Ives

James Keeling

Karel Lenc

Salem Haykal

Siamak Shakeri

Pranav Shyam

Aakanksha Chowdhery

Roman Ring

Stephen Spencer

Eren Sezener

Luke Vilnis

Oscar Chang

Nobuyuki Morioka

George Tucker

Ce Zheng

Oliver Woodman

Nithya Attaluri

Tomas Kocisky

Evgenii Eltyshev

Xi Chen

Timothy Chung

Vittorio Selo

Siddhartha Brahma

Petko Georgiev

Ambrose Slone

Zhenkai Zhu

James Lottes

Siyuan Qiao

Ben Caine

Sebastian Riedel

Alex Tomala

Martin J. Chadwick

J Christopher Love

Peter Choy

Sid Mittal

Neil Houlsby

Yunhao Tang

Matthew Lamm

Libin Bai

Qiao Zhang

Luheng He

Yong Cheng

Peter Conway Humphreys

Yujia Li

Sergey Brin

Albin Cassirer

Ying-Qi Miao

Lukáš Žilka

Taylor Tobin

Kelvin Xu

Lev Proleev

Daniel Sohn

Alberto Magni

Lisa Anne Hendricks

Isabel Gao

Santiago Ontan'on

Oskar Bunyan

Nathan Byrd

Abhanshu Sharma

Biao Zhang

Mario Pinto

Rishika Sinha

Harsh Mehta

Dawei Jia

Sergi Caelles

Albert Webson

Alex Morris

Becca Roelofs

Yifan Ding

Robin Strudel

Xuehan Xiong

Marvin Ritter

Mostafa Dehghani

Rahma Chaabouni

Abhijit Karmarkar

Guangda Lai

Fabian Mentzer

Bibo Xu

YaGuang Li

Yujing Zhang

T. Paine

Alex Goldin

Behnam Neyshabur

Kate Baumli

Anselm C. Levskaya

Michael Laskin

Wenhao Jia

Jack W. Rae

Kefan Xiao

Antoine He

Skye Giordano

Lakshman N. Yagati

Jean-Baptiste Lespiau

Paul Natsev

Sanjay Ganapathy

Fangyu Liu

Danilo Martins

Nanxin Chen

Yunhan Xu

Megan Barnes

Rhys May

Arpi Vezer

Junhyuk Oh

Ken Franko

Sophie Bridgers

Ruizhe Zhao

Boxi Wu

Basil Mustafa

Sean Sechrist

Emilio Parisotto

Thanumalayan Sankaranarayana Pillai

Chris Larkin

Chenjie Gu

Christina Sorokin

M. Krikun

Alexey Guseynov

Jessica Landon

Romina Datta

Alexander Pritzel

Phoebe Thacker

Fan Yang

Kevin Hui

A.E. Hauth

Chih-Kuan Yeh

David Barker

Justin Mao-jones

Sophia Austin

Hannah Rachel Sheahan

Parker Schuh

James Svensson

Rohan Jain

Vinay Venkatesh Ramasesh

Anton Briukhov

Da-Woon Chung

Tamara von Glehn

Christina Butterfield

Priya Jhakra

Matt Wiethoff

Justin Frye

Jordan Grimstad

Beer Changpinyo

Charline Le Lan

Anna Bortsova

Yonghui Wu

Paul Voigtlaender

Tara N. Sainath

Charlotte Smith

Will Hawkins

Kris Cao

James Besley

Srivatsan Srinivasan

Mark Omernick

Colin Gaffney

Gabriela Surita

Ryan Burnell

Bogdan Damoc

Junwhan Ahn

Andrew Brock

Mantas Pajarskas

Anastasia Petrushkina

Seb Noury

Lorenzo Blanco

Kevin Swersky

Arun Ahuja

Thi Avrahami

Vedant Misra

Raoul de Liedekerke

Mariko Iinuma

Alex Polozov

Sarah York

George van den Driessche

Paul Michel

Justin Chiu

Rory Blevins

Zach Gleicher

Adria Recasens

Alban Rrustemi

Elena Gribovskaya

Aurko Roy

Wiktor Gworek

S'ebastien M. R. Arnold

Lisa Lee

James Lee-Thorp

Marcello Maggioni

Enrique Piqueras

Kartikeya Badola

Sharad Mandyam Vikram

Lucas Gonzalez

Anirudh Baddepudi

Evan Senter

Jacob Devlin

James Qin

Michael Azzam

Maja Trebacz

M. Polacek

Kashyap Krishnakumar

Shuo-yiin Chang

Matthew Tung

Ivo Penchev

Rishabh Joshi

Kate Olszewska

Carrie Muir

Mateo Wirth

Ale Jakse Hartman

Joshua Newlan

Sheleem Kashem

Vijay Bolina

Elahe Dabir

Joost Van Amersfoort

Zafarali Ahmed

James Cobon-Kerr

Aishwarya B Kamath

Arnar Mar Hrafnkelsson

Le Hou

Ian Mackinnon

Alexandre Fréchette

Eric Noland

Xiance Si

Emanuel Taropa

Dong Li

Phil Crone

Anmol Gulati

S'ebastien Cevey

Jonas Adler

Ada Ma

David Silver

Simon Tokumine

Richard Powell

Stephan Lee

Samer Hassan

Diana Mincu

Antoine Yang

Nir Levine

Jenny Brennan

Mingqiu Wang

Sarah Hodkinson

Jeffrey Zhao

Josh Lipschultz

Aedan Pope

Michael B. Chang

Cheng Li

Laurent El Shafey

Michela Paganini

Sholto Douglas

Bernd Bohnet

Fabio Pardo

Seth Odoom

Mihaela Rosca

Cicero Nogueira dos Santos

Kedar Soparkar

Arthur Guez

Tom Hudson

Steven Hansen

Chulayuth Asawaroengchai

Ravichandra Addanki

Tianhe Yu

Wojciech Stokowiec

Mina Khan

Justin Gilmer

Jaehoon Lee

Carrie Grimes Bostock

Keran Rong

Jonathan Caton

Pedram Pejman

Filip Pavetic

Geoff Brown

Vivek Sharma

Mario Luvci'c

Rajkumar Samuel

Josip Djolonga

Amol Mandhane

Lars Lowe Sjosund

Elena Buchatskaya

Elspeth White

Natalie Clay

Jiepu Jiang

Hyeontaek Lim

Ross Hemsley

Jane Labanowski

Nicola De Cao

David Steiner

Sayed Hadi Hashemi

Jacob Austin

Anita Gergely

Tim Blyth

Joe Stanton

Kaushik Shivakumar

Aditya Siddhant

Anders Johan Andreassen

Carlos L. Araya

Nikhil Sethi

Rakesh Shivanna

Steven Hand

Ankur Bapna

A. Khodaei

Antoine Miech

Garrett Tanzer

Andy Swing

Shantanu Thakoor

Zhufeng Pan

Zachary Nado

Stephanie Winkler

Dian Yu

Mohammad Saleh

Lorenzo Maggiore

Iain Barr

Minh Giang

Thais Kagohara

Ivo Danihelka

Amit Marathe

Vladimir Feinberg

Mohamed Elhawaty

Nimesh Ghelani

Dan Horgan

Helen Miller

Lexi Walker

Richard Tanburn

Mukarram Tariq

Disha Shrivastava

Fei Xia

Chung-Cheng Chiu

Zoe C. Ashwood

Khuslen Baatarsukh

Sina Samangooei

Fred Alcober

Axel Stjerngren

Paul Komarek

Katerina Tsihlas

Anudhyan Boral

Ramona Comanescu

Jeremy Chen

Ruibo Liu

Dawn Bloxwich

Charlie Chen

Yanhua Sun

Fangxiaoyu Feng

Matthew Mauger

Xerxes Dotiwalla

Vincent Hellendoorn

Michael Sharman

Ivy Zheng

Krishna S Haridasan

Gabriel Barth-Maron

Craig Swanson

Dominika Rogozi'nska

Alek Andreev

Paul Kishan Rubenstein

Ruoxin Sang

Dan Hurt

Gamaleldin Elsayed

Renshen Wang

Dave Lacey

Anastasija Ili'c

Yao Zhao

Lora Aroyo

Chimezie Iwuanyanwu

Vitaly Nikolaev

Balaji Lakshminarayanan

Sadegh Jazayeri

Raphael Lopez Kaufman

Mani Varadarajan

Chetan Tekur

Doug Fritz

Misha Khalman

David Reitter

Kingshuk Dasgupta

Shourya Sarcar

Tina Ornduff

Javier Snaider

Fantine Huot

Johnson Jia

Rupert Kemp

Nejc Trdin

Anitha Vijayakumar

Lucy Kim

Christof Angermueller

Li Lao

Tianqi Liu

Haibin Zhang

David Engel

Somer Greene

Anais White

Jessica Austin

Lilly Taylor

Shereen Ashraf

Dangyi Liu

Maria Georgaki

Irene Cai

Yana Kulizhskaya

Sonam Goenka

Brennan Saeta

Kiran N. Vodrahalli

Christian Frank

D. Cesare

Brona Robenek

Harry Richardson

Mahmoud Alnahlawi

Christopher Yew

Priya Ponnapalli

Marco Tagliasacchi

Alex Korchemniy

Yelin Kim

Dinghua Li

Bill Rosgen

Kyle Levin

Jeremy Wiesner

Praseem Banzal

Praveen Srinivasan

Hongkun Yu

cCauglar Unlu

David Reid

Zora Tung

Daniel Finchelstein

Ravin Kumar

Andre Elisseeff

Jin Huang

Ming Zhang

Rui Zhu

Ricardo Aguilar

Mai Gim'enez

Jiawei Xia

Olivier Dousse

W. Gierke

S. Yeganeh

Damion Yates

Komal Jalan

Lu Li

Eri Latorre-Chimoto

Duc Dung Nguyen

Ken Durden

Praveen Kallakuri

Yaxin Liu

Matthew Johnson

Tomy Tsai

Alice Talbert

Jasmine Liu

Alexander Neitz

Chen Elkind

Marco Selvi

Mimi Jasarevic

Livio Baldini Soares

Albert Cui

Pidong Wang

Alek Wenjiao Wang

Xinyu Ye

Krystal Kallarackal

Lucia Loher

Hoi Lam

Josef Broder

D. Holtmann-Rice

Nina Martin

Bramandia Ramadhana

Daniel Toyama

Mrinal Shukla

Sujoy Basu

Abhi Mohan

Nicholas Fernando

2024-03-08

ArXiv (preprint)

Generative Models for Decision Making

Bogdan Mazoure

Lisa Lee

Roberta Raileanu

Yilun Du

Walter Talbott

Katherine Metcalf

(Rex) Devon Hjelm

Alexander T Toshev

Generative Artificial Intelligence (AI) has made significant advancements in recent years, particularly with the development of large langua… (see more)ge and diffusion models. These generative models have demonstrated impressive capabilities in various tasks, such as text generation and image and audio synthesis. Concurrently, Reinforcement Learning (RL) has made significant strides in solving complex sequential decision-making problems with the help of external knowledge sources . However, there remains untapped potential in combining generative models with RL algorithms to tackle real-world challenges, particularly to improve sample efficiency of tabula rasa training by introducing priors from related domains such as visual question-answering, image captioning and image generation. This workshop aims to bring together researchers and practitioners from the fields of generative AI and reinforcement learning to explore the latest advances, methodologies, and applications. By fostering collaborations between these two domains, we intend to unlock new opportunities for addressing complex problems that lie at the intersection of both fields.

2024-03-08

ICLR.cc/2024/Workshop_Proposals (published)

Global AI Cultures

Rida Qadri

Fernando Diaz

Arjun Subramonian

Sunipa Dev

Georgina Emma Born

Mary L. Gray

Jessica Quaye

Rachel Bergmann

2024-03-08

ICLR.cc/2024/Workshop_Proposals (published)

Integrating Generative and Experimental Platforms or Biomolecular Design

Cheng-Hao Liu

Jarrid Rector-Brooks

Jason Yim

Soojung Yang

Sidney Lisanza

Francesca-Zhoufan Li

Pranam Chatterjee

Tommi Jaakkola

Regina Barzilay

David Baker

Frances H. Arnold

2024-03-08

ICLR.cc/2024/Workshop_Proposals (published)

Tackling Climate Change with Machine Learning: Fostering the Maturity of ML Applications for Climate Change

Shiva Madadkhani

Olivia Mendivil Ramos

Millie Chapman

Jesse Dunietz

Arthur Ouaknine

David Rolnick

2024-03-08

ICLR.cc/2024/Workshop_Proposals (published)

Globally Stable Neural Imitation Policies

Amin Abyaneh

Mariana Sosa Guzm'an

Hsiu-Chin Lin

2024-03-07

ArXiv (preprint)

Machine learning and information theory concepts towards an AI Mathematician

Nikolay Malkin

The current state-of-the-art in artificial intelligence is impressive, especially in terms of mastery of language, but not so much in terms … (see more)of mathematical reasoning. What could be missing? Can we learn something useful about that gap from how the brains of mathematicians go about their craft? This essay builds on the idea that current deep learning mostly succeeds at system 1 abilities -- which correspond to our intuition and habitual behaviors -- but still lacks something important regarding system 2 abilities -- which include reasoning and robust uncertainty estimation. It takes an information-theoretical posture to ask questions about what constitutes an interesting mathematical statement, which could guide future work in crafting an AI mathematician. The focus is not on proving a given theorem but on discovering new and interesting conjectures. The central hypothesis is that a desirable body of theorems better summarizes the set of all provable statements, for example by having a small description length while at the same time being close (in terms of number of derivation steps) to many provable statements.

2024-03-07

ArXiv (preprint)

Personalized Negative Reservoir for Incremental Learning in Recommender Systems

Antonios Valkanas

Yuening Wang

Yingxue Zhang

Mark Coates

2024-03-06

ArXiv (preprint)

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Ho Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

A. Irpan

Sergey Levine

Pablo Samuel Castro

Aleksandra Faust

Aviral Kumar

Rishabh Agarwal

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (see more)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

2024-03-06

ArXiv (preprint)

Efficient Causal Graph Discovery Using Large Language Models

Thomas Jiralerspong

Xiaoyin Chen

Yash More

Vedant Shah

2024-03-05

ICLR.cc/2024/Workshop/AGI (poster)