Publications

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lu
Zdeněk Kasner
We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WebLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx.
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Massimo Caccia
Issam Hadj Laradji
Manuel Del Verme
Tom Marty
Léo Boisvert
Megh Thakkar
David Vazquez
Alexandre Lacoste
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 29 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Machel Reid
Nikolay Savinov
Denis Teplyashin
Dmitry Lepikhin
Timothy P. Lillicrap
Jean-Baptiste Alayrac
Radu Soricut
Angeliki Lazaridou
Orhan Firat
Julian Schrittwieser
Ioannis Antonoglou
Rohan Anil
Sebastian Borgeaud
Andrew M. Dai
Katie Millican
Ethan Dyer
Mia Glaese
Thibault Sottiaux
Benjamin Lee
Malcolm Reynolds
Yuanzhong Xu
James L. Molloy
Jilin Chen
Michael Acheson Isard
Paul R. Barham
Tom Hennigan
Ross McIlroy
Melvin Johnson
J. Schalkwyk
Eli Collins
Eliza Rutherford
Erica Moreira
Kareem W. Ayoub
Megha Goel
Clemens Meyer
Gregory Thornton
Zhen Yang
Henryk Michalewski
Zaheer Abbas
Nathan Schucher
Ankesh Anand
Richard Ives
James Keeling
Karel Lenc
Salem Haykal
Siamak Shakeri
Pranav Shyam
Aakanksha Chowdhery
Roman Ring
Stephen Spencer
Eren Sezener
Luke Vilnis
Oscar Chang
Nobuyuki Morioka
George Tucker
Ce Zheng
Oliver Woodman
Nithya Attaluri
Tomas Kocisky
Evgenii Eltyshev
Xi Chen
Timothy Chung
Vittorio Selo
Siddhartha Brahma
Petko Georgiev
Ambrose Slone
Zhenkai Zhu
James Lottes
Siyuan Qiao
Ben Caine
Sebastian Riedel
Alex Tomala
Martin J. Chadwick
J Christopher Love
Peter Choy
Sid Mittal
Neil Houlsby
Yunhao Tang
Matthew Lamm
Libin Bai
Qiao Zhang
Luheng He
Yong Cheng
Peter Conway Humphreys
Yujia Li
Sergey Brin
Albin Cassirer
Ying-Qi Miao
Lukáš Žilka
Taylor Tobin
Kelvin Xu
Lev Proleev
Daniel Sohn
Alberto Magni
Lisa Anne Hendricks
Isabel Gao
Santiago Ontan'on
Oskar Bunyan
Nathan Byrd
Abhanshu Sharma
Biao Zhang
Mario Pinto
Rishika Sinha
Harsh Mehta
Dawei Jia
Sergi Caelles
Albert Webson
Alex Morris
Becca Roelofs
Yifan Ding
Robin Strudel
Xuehan Xiong
Marvin Ritter
Mostafa Dehghani
Rahma Chaabouni
Abhijit Karmarkar
Guangda Lai
Fabian Mentzer
Bibo Xu
YaGuang Li
Yujing Zhang
T. Paine
Alex Goldin
Behnam Neyshabur
Kate Baumli
Anselm C. Levskaya
Michael Laskin
Wenhao Jia
Jack W. Rae
Kefan Xiao
Antoine He
Skye Giordano
Lakshman N. Yagati
Jean-Baptiste Lespiau
Paul Natsev
Sanjay Ganapathy
Fangyu Liu
Danilo Martins
Nanxin Chen
Yunhan Xu
Megan Barnes
Rhys May
Arpi Vezer
Junhyuk Oh
Ken Franko
Sophie Bridgers
Ruizhe Zhao
Boxi Wu
Basil Mustafa
Sean Sechrist
Emilio Parisotto
Thanumalayan Sankaranarayana Pillai
Chris Larkin
Chenjie Gu
Christina Sorokin
M. Krikun
Alexey Guseynov
Jessica Landon
Romina Datta
Alexander Pritzel
Phoebe Thacker
Fan Yang
Kevin Hui
A.E. Hauth
Chih-Kuan Yeh
David Barker
Justin Mao-jones
Sophia Austin
Hannah Rachel Sheahan
Parker Schuh
James Svensson
Rohan Jain
Vinay Venkatesh Ramasesh
Anton Briukhov
Da-Woon Chung
Tamara von Glehn
Christina Butterfield
Priya Jhakra
Matt Wiethoff
Justin Frye
Jordan Grimstad
Beer Changpinyo
Charline Le Lan
Anna Bortsova
Yonghui Wu
Paul Voigtlaender
Tara N. Sainath
Charlotte Smith
Will Hawkins
Kris Cao
James Besley
Srivatsan Srinivasan
Mark Omernick
Colin Gaffney
Gabriela Surita
Ryan Burnell
Bogdan Damoc
Junwhan Ahn
Andrew Brock
Mantas Pajarskas
Anastasia Petrushkina
Seb Noury
Lorenzo Blanco
Kevin Swersky
Arun Ahuja
Thi Avrahami
Vedant Misra
Raoul de Liedekerke
Mariko Iinuma
Alex Polozov
Sarah York
George van den Driessche
Paul Michel
Justin Chiu
Rory Blevins
Zach Gleicher
Adria Recasens
Alban Rrustemi
Elena Gribovskaya
Aurko Roy
Wiktor Gworek
S'ebastien M. R. Arnold
Lisa Lee
James Lee-Thorp
Marcello Maggioni
Enrique Piqueras
Kartikeya Badola
Sharad Mandyam Vikram
Lucas Gonzalez
Anirudh Baddepudi
Evan Senter
Jacob Devlin
James Qin
Michael Azzam
Maja Trebacz
M. Polacek
Kashyap Krishnakumar
Shuo-yiin Chang
Matthew Tung
Ivo Penchev
Rishabh Joshi
Kate Olszewska
Carrie Muir
Mateo Wirth
Ale Jakse Hartman
Joshua Newlan
Sheleem Kashem
Vijay Bolina
Elahe Dabir
Joost Van Amersfoort
Zafarali Ahmed
James Cobon-Kerr
Aishwarya B Kamath
Arnar Mar Hrafnkelsson
Le Hou
Ian Mackinnon
Alexandre Fréchette
Eric Noland
Xiance Si
Emanuel Taropa
Dong Li
Phil Crone
Anmol Gulati
S'ebastien Cevey
Jonas Adler
Ada Ma
David Silver
Simon Tokumine
Richard Powell
Stephan Lee
Samer Hassan
Diana Mincu
Antoine Yang
Nir Levine
Jenny Brennan
Mingqiu Wang
Sarah Hodkinson
Jeffrey Zhao
Josh Lipschultz
Aedan Pope
Michael B. Chang
Cheng Li
Laurent El Shafey
Michela Paganini
Sholto Douglas
Bernd Bohnet
Fabio Pardo
Seth Odoom
Mihaela Rosca
Cicero Nogueira dos Santos
Kedar Soparkar
Arthur Guez
Tom Hudson
Steven Hansen
Chulayuth Asawaroengchai
Ravichandra Addanki
Tianhe Yu
Wojciech Stokowiec
Mina Khan
Justin Gilmer
Jaehoon Lee
Carrie Grimes Bostock
Keran Rong
Jonathan Caton
Pedram Pejman
Filip Pavetic
Geoff Brown
Vivek Sharma
Mario Luvci'c
Rajkumar Samuel
Josip Djolonga
Amol Mandhane
Lars Lowe Sjosund
Elena Buchatskaya
Elspeth White
Natalie Clay
Jiepu Jiang
Hyeontaek Lim
Ross Hemsley
Jane Labanowski
Nicola De Cao
David Steiner
Sayed Hadi Hashemi
Jacob Austin
Anita Gergely
Tim Blyth
Joe Stanton
Kaushik Shivakumar
Aditya Siddhant
Anders Johan Andreassen
Carlos L. Araya
Nikhil Sethi
Rakesh Shivanna
Steven Hand
Ankur Bapna
A. Khodaei
Antoine Miech
Garrett Tanzer
Andy Swing
Shantanu Thakoor
Zhufeng Pan
Zachary Nado
Stephanie Winkler
Dian Yu
Mohammad Saleh
Lorenzo Maggiore
Iain Barr
Minh Giang
Thais Kagohara
Ivo Danihelka
Amit Marathe
Vladimir Feinberg
Mohamed Elhawaty
Nimesh Ghelani
Dan Horgan
Helen Miller
Lexi Walker
Richard Tanburn
Mukarram Tariq
Disha Shrivastava
Fei Xia
Chung-Cheng Chiu
Zoe C. Ashwood
Khuslen Baatarsukh
Sina Samangooei
Fred Alcober
Axel Stjerngren
Paul Komarek
Katerina Tsihlas
Anudhyan Boral
Ramona Comanescu
Jeremy Chen
Ruibo Liu
Dawn Bloxwich
Charlie Chen
Yanhua Sun
Fangxiaoyu Feng
Matthew Mauger
Xerxes Dotiwalla
Vincent Hellendoorn
Michael Sharman
Ivy Zheng
Krishna S Haridasan
Gabriel Barth-Maron
Craig Swanson
Dominika Rogozi'nska
Alek Andreev
Paul Kishan Rubenstein
Ruoxin Sang
Dan Hurt
Gamaleldin Elsayed
Renshen Wang
Dave Lacey
Anastasija Ili'c
Yao Zhao
Lora Aroyo
Chimezie Iwuanyanwu
Vitaly Nikolaev
Balaji Lakshminarayanan
Sadegh Jazayeri
Raphael Lopez Kaufman
Mani Varadarajan
Chetan Tekur
Doug Fritz
Misha Khalman
David Reitter
Kingshuk Dasgupta
Shourya Sarcar
Tina Ornduff
Javier Snaider
Fantine Huot
Johnson Jia
Rupert Kemp
Nejc Trdin
Anitha Vijayakumar
Lucy Kim
Christof Angermueller
Li Lao
Tianqi Liu
Haibin Zhang
David Engel
Somer Greene
Anais White
Jessica Austin
Lilly Taylor
Shereen Ashraf
Dangyi Liu
Maria Georgaki
Irene Cai
Yana Kulizhskaya
Sonam Goenka
Brennan Saeta
Kiran N. Vodrahalli
Christian Frank
D. Cesare
Brona Robenek
Harry Richardson
Mahmoud Alnahlawi
Christopher Yew
Priya Ponnapalli
Marco Tagliasacchi
Alex Korchemniy
Yelin Kim
Dinghua Li
Bill Rosgen
Kyle Levin
Jeremy Wiesner
Praseem Banzal
Praveen Srinivasan
Hongkun Yu
cCauglar Unlu
David Reid
Zora Tung
Daniel Finchelstein
Ravin Kumar
Andre Elisseeff
Jin Huang
Ming Zhang
Rui Zhu
Ricardo Aguilar
Mai Gim'enez
Jiawei Xia
Olivier Dousse
W. Gierke
S. Yeganeh
Damion Yates
Komal Jalan
Lu Li
Eri Latorre-Chimoto
Duc Dung Nguyen
Ken Durden
Praveen Kallakuri
Yaxin Liu
Matthew Johnson
Tomy Tsai
Alice Talbert
Jasmine Liu
Alexander Neitz
Chen Elkind
Marco Selvi
Mimi Jasarevic
Livio Baldini Soares
Albert Cui
Pidong Wang
Alek Wenjiao Wang
Xinyu Ye
Krystal Kallarackal
Lucia Loher
Hoi Lam
Josef Broder
D. Holtmann-Rice
Nina Martin
Bramandia Ramadhana
Daniel Toyama
Mrinal Shukla
Sujoy Basu
Abhi Mohan
Nicholas Fernando
Generative Models for Decision Making
Bogdan Mazoure
Lisa Lee
Roberta Raileanu
Yilun Du
Walter Talbott
Katherine Metcalf
Alexander T Toshev
Generative Artificial Intelligence (AI) has made significant advancements in recent years, particularly with the development of large langua… (see more)ge and diffusion models. These generative models have demonstrated impressive capabilities in various tasks, such as text generation and image and audio synthesis. Concurrently, Reinforcement Learning (RL) has made significant strides in solving complex sequential decision-making problems with the help of external knowledge sources . However, there remains untapped potential in combining generative models with RL algorithms to tackle real-world challenges, particularly to improve sample efficiency of tabula rasa training by introducing priors from related domains such as visual question-answering, image captioning and image generation. This workshop aims to bring together researchers and practitioners from the fields of generative AI and reinforcement learning to explore the latest advances, methodologies, and applications. By fostering collaborations between these two domains, we intend to unlock new opportunities for addressing complex problems that lie at the intersection of both fields.
Global AI Cultures
Rida Qadri
Arjun Subramonian
Sunipa Dev
Georgina Emma Born
Mary L. Gray
Jessica Quaye
Rachel Bergmann
Integrating Generative and Experimental Platforms or Biomolecular Design
Cheng-Hao Liu
Jarrid Rector-Brooks
Jason Yim
Soojung Yang
Sidney Lisanza
Francesca-Zhoufan Li
Pranam Chatterjee
Tommi Jaakkola
Regina Barzilay
David Baker
Frances H. Arnold
Tackling Climate Change with Machine Learning: Fostering the Maturity of ML Applications for Climate Change
Shiva Madadkhani
Olivia Mendivil Ramos
Millie Chapman
Jesse Dunietz
Arthur Ouaknine
Globally Stable Neural Imitation Policies
Amin Abyaneh
Mariana Sosa Guzm'an
Machine learning and information theory concepts towards an AI Mathematician
Nikolay Malkin
The current state-of-the-art in artificial intelligence is impressive, especially in terms of mastery of language, but not so much in terms … (see more)of mathematical reasoning. What could be missing? Can we learn something useful about that gap from how the brains of mathematicians go about their craft? This essay builds on the idea that current deep learning mostly succeeds at system 1 abilities -- which correspond to our intuition and habitual behaviors -- but still lacks something important regarding system 2 abilities -- which include reasoning and robust uncertainty estimation. It takes an information-theoretical posture to ask questions about what constitutes an interesting mathematical statement, which could guide future work in crafting an AI mathematician. The focus is not on proving a given theorem but on discovering new and interesting conjectures. The central hypothesis is that a desirable body of theorems better summarizes the set of all provable statements, for example by having a small description length while at the same time being close (in terms of number of derivation steps) to many provable statements.
Personalized Negative Reservoir for Incremental Learning in Recommender Systems
Antonios Valkanas
Yuening Wang
Yingxue Zhang
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Jesse Farebrother
Jordi Orbay
Quan Ho Vuong
Adrien Ali Taiga
Yevgen Chebotar
Ted Xiao
A. Irpan
Sergey Levine
Aleksandra Faust
Aviral Kumar
Rishabh Agarwal
Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (see more)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.
Efficient Causal Graph Discovery Using Large Language Models
Thomas Jiralerspong
Xiaoyin Chen
Yash More
Vedant Shah