Ankesh Anand

Zaheer Abbas

Azade Nova

John D Co-Reyes

Eric Chu

Feryal Behbahani

Aleksandra Faust

Hugo Larochelle

Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, w… (see more)ithout any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples – the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated outputs. To mitigate this limitation, we explore two new settings: (1) "Reinforced ICL" that uses model-generated chain-of-thought rationales in place of human rationales, and (2) "Unsupervised ICL" where we remove rationales from the prompt altogether, and prompts the model only with domain-specific inputs. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. We demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to supervised fine-tuning. Finally, we reveal the limitations of next-token prediction loss as an indicator of downstream ICL performance.

2024-09-24

NeurIPS.cc/2024/Conference (spotlight)

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team Google Rohan Anil

Sebastian Borgeaud

Yonghui Wu

Jean-Baptiste Alayrac

Jiahui Yu

Radu Soricut

J. Schalkwyk

Andrew M. Dai

Anja Hauth

Katie Millican

David Silver

Slav Petrov

Melvin Johnson

Ioannis Antonoglou

Julian Schrittwieser

Amelia Glaese

Jilin Chen

Emily Pitler

Timothy P Lillicrap

Angeliki Lazaridou … (see 480 more)

Orhan Firat

James L. Molloy

Michael Acheson Isard

Paul R. Barham

Tom Hennigan

Benjamin Lee

Fabio Viola

Malcolm Reynolds

Yuanzhong Xu

Ryan Doherty

Eli Collins

Clemens Meyer

Eliza Rutherford

Erica Moreira

Kareem W. Ayoub

Megha Goel

George Tucker

Enrique Piqueras

M. Krikun

Iain Barr

Nikolay Savinov

Ivo Danihelka

Becca Roelofs

Anais White

Anders Johan Andreassen

Tamara von Glehn

Laksh-man Yagati

Mehran Kazemi

Lucas Gonzalez

Misha Khalman

Jakub Sygnowski

Alexandre Fréchette

Charlotte Smith

Laura Culp

Lev Proleev

Yi Luan

X. T. Chen

James Lottes

Nathan Schucher

Federico Lebron

Alban Rrustemi

Natalie Clay

Phil Crone

Tomas Kocisky

Jeffrey Zhao

Bartek Perz

Dian Yu

Heidi Howard

Adam E. Bloniarz

Jack W. Rae

Han Lu

Laurent Sifre

Marcello Maggioni

Fred Alcober

Dan Garrette

Megan Barnes

Shantanu Thakoor

Jacob Austin

Gabriel Barth-Maron

William Wong

Rishabh Joshi

Rahma Chaabouni

Deeni Fatiha

Arun Ahuja

Ruibo Liu

Yunxuan Li

Sarah Cogan

Jeremy Chen

Chao Jia

Chenjie Gu

Qiao Zhang

Jordan Grimstad

Ale Jakse Hartman

Martin J. Chadwick

Gaurav Singh Tomar

Xavier Garcia

Evan Senter

Emanuel Taropa

Thanumalayan Sankaranarayana Pillai

Jacob Devlin

Michael Laskin

Diego de Las Casas

Dasha Valter

Connie Tao

Lorenzo Blanco

Adrià Puigdomènech Badia

David Reitter

Mianna Chen

Jenny Brennan

Clara E. Rivera

Sergey Brin

Shariq Iqbal

Gabriela Surita

Jane Labanowski

Abhishek Rao

Stephanie Winkler

Emilio Parisotto

Yiming Gu

Kate Olszewska

Yujing Zhang

Ravichandra Addanki

Antoine Miech

Annie Louis

Laurent El Shafey

Denis Teplyashin

Geoff Brown

Elliot Catt

Nithya Attaluri

Jan Balaguer

Jackie Xiang

Pidong Wang

Zoe Ashwood

Anton Briukhov

Alex Webson

Sanjay Ganapathy

Smit Sanghavi

Ajay Kannan

Ming-Wei Chang

Axel Stjerngren

Josip Djolonga

Yuting Sun

Ankur Bapna

Matthew Aitchison

Pedram Pejman

Henryk Michalewski

Tianhe Yu

Cindy Wang

J Christopher Love

Junwhan Ahn

Dawn Bloxwich

Kehang Han

Peter Conway Humphreys

Thibault Sellam

James Bradbury

Varun Godbole

Sina Samangooei

Bogdan Damoc

Alex Kaskasoli

S'ebastien M. R. Arnold

Vijay Vasudevan

Shubham Agrawal

Jason Riesa

Dmitry Lepikhin

Richard Tanburn

Srivatsan Srinivasan

Hyeontaek Lim

Sarah Hodkinson

Pranav Shyam

Johan Ferret

Steven Hand

Ankush Garg

T. Paine

Jian Li

Yujia Li

Minh Giang

Alexander Neitz

Zaheer Abbas

Sarah York

Machel Reid

Elizabeth Cole

Aakanksha Chowdhery

Dipanjan Das

Dominika Rogozi'nska

Vitaly Nikolaev

Pablo G. Sprechmann

Zachary Nado

Lukáš Žilka

Flavien Prost

Luheng He

Marianne Monteiro

Gaurav Mishra

Christoper A. Welty

Joshua Newlan

Dawei Jia

Miltiadis Allamanis

Clara Huiyi Hu

Raoul de Liedekerke

Justin Gilmer

Carl Saroufim

Shruti Rijhwani

Shaobo Hou

Disha Shrivastava

Anirudh Baddepudi

Alex Goldin

Adnan Ozturel

Albin Cassirer

Yunhan Xu

Daniel Sohn

Devendra Singh Sachan

Reinald Kim Amplayo

Craig Swanson

Dessie Petrova

Shashi Narayan

Arthur Guez

Siddhartha Brahma

Jessica Landon

Miteyan Patel

Ruizhe Zhao

Kevin Villela

Luyu Wang

Wenhao Jia

Matthew Rahtz

Mai Gim'enez

Legg Yeung

Hanzhao Lin

James Keeling

Petko Georgiev

Diana Mincu

Boxi Wu

Salem Haykal

Rachel Saputro

Kiran N. Vodrahalli

James Qin

Zeynep Cankara

Abhanshu Sharma

Nicholas Fernando

Will Hawkins

Behnam Neyshabur

Solomon Kim

Adrian Hutter

Priyanka Agrawal

Alex Castro-Ros

George van den Driessche

Tao Wang

Fan Yang

Shuo-yiin Chang

Paul Komarek

Ross McIlroy

Mario Luvci'c

Guodong Zhang

Wael Farhan

Michael Sharman

Paul Natsev

Paul Michel

Yong Cheng

Yamini Bansal

Siyuan Qiao

Kris Cao

Siamak Shakeri

Christina Butterfield

Justin Chung

Paul Kishan Rubenstein

Shivani Agrawal

Arthur Mensch

Kedar Soparkar

Karel Lenc

Timothy Chung

Aedan Pope

Lorenzo Maggiore

Jackie Kay

Priya Jhakra

Shibo Wang

Joshua Maynez

Mary Phuong

Taylor Tobin

Andrea Tacchetti

Maja Trebacz

Kevin Robinson

Yash Katariya

Sebastian Riedel

Paige Bailey

Kefan Xiao

Nimesh Ghelani

Lora Aroyo

Ambrose Slone

Neil Houlsby

Xuehan Xiong

Zhen Yang

Elena Gribovskaya

Jonas Adler

Mateo Wirth

Lisa Lee

Music Li

Thais Kagohara

Jay Pavagadhi

Sophie Bridgers

Anna Bortsova

Sanjay Ghemawat

Zafarali Ahmed

Tianqi Liu

Richard Powell

Vijay Bolina

Mariko Iinuma

Polina Zablotskaia

James Besley

Da-Woon Chung

Timothy Dozat

Ramona Comanescu

Xiance Si

Jeremy Greer

Guolong Su

M. Polacek

Raphael Lopez Kaufman

Simon Tokumine

Hexiang Hu

Elena Buchatskaya

Yingjie Miao

Mohamed Elhawaty

Aditya Siddhant

Nenad Tomasev

Jinwei Xing

Christina Greer

Helen Miller

Shereen Ashraf

Aurko Roy

Zizhao Zhang

Ada Ma

Angelos Filos

Milos Besta

Rory Blevins

Ted Klimenko

Chih-Kuan Yeh

Soravit Changpinyo

Jiaqi Mu

Oscar Chang

Mantas Pajarskas

Carrie Muir

Vered Cohen

Charline Le Lan

Krishna S Haridasan

Amit Marathe

Steven Stenberg Hansen

Sholto Douglas

Rajkumar Samuel

Mingqiu Wang

Sophia Austin

Chang Lan

Jiepu Jiang

Justin Chiu

Jaime Alonso Lorenzo

Lars Lowe Sjosund

S'ebastien Cevey

Zach Gleicher

Thi Avrahami

Anudhyan Boral

Hansa Srinivasan

Vittorio Selo

Rhys May

Konstantinos Aisopos

L'eonard Hussenot

Livio Baldini Soares

Kate Baumli

Michael B. Chang

Adria Recasens

Benjamin Caine

Alexander Pritzel

Filip Pavetic

Fabio Pardo

Anita Gergely

Justin Frye

Vinay Venkatesh Ramasesh

Dan Horgan

Kartikeya Badola

Nora Kassner

Subhrajit Roy

Ethan Dyer

V'ictor Campos

Alex Tomala

Yunhao Tang

Dalia El Badawy

Elspeth White

Basil Mustafa

Oran Lang

Abhishek Jindal

Sharad Mandyam Vikram

Zhitao Gong

Sergi Caelles

Ross Hemsley

Gregory Thornton

Fangxiaoyu Feng

Wojciech Stokowiec

Ce Zheng

Phoebe Thacker

cCauglar Unlu

Zhishuai Zhang

Mohammad Saleh

James Svensson

Maxwell L. Bileschi

Piyush Patil

Roman Ring

Katerina Tsihlas

Arpi Vezer

Marco Selvi

Toby Shevlane

Mikel Rodriguez

Tom Kwiatkowski

Samira Daruki

Keran Rong

Allan Dafoe

Nicholas Fitzgerald

Keren Gu-Lemberg

Mina Khan

Lisa Anne Hendricks

Marie Pellat

Vladimir Feinberg

James Cobon-Kerr

Tara N. Sainath

Maribeth Rauh

Sayed Hadi Hashemi

Richard Ives

Yana Hasson

YaGuang Li

Eric Noland

Yuan Cao

Nathan Byrd

Le Hou

Qingze Wang

Thibault Sottiaux

Michela Paganini

Jean-Baptiste Lespiau

Alexandre Moufarek

Samer Hassan

Kaushik Shivakumar

Joost Van Amersfoort

Amol Mandhane

Pratik M. Joshi

Anirudh Goyal

Matthew Tung

Andy Brock

Hannah Rachel Sheahan

Vedant Misra

Cheng Li

Nemanja Raki'cevi'c

Mostafa Dehghani

Fangyu Liu

Sid Mittal

Junhyuk Oh

Seb Noury

Eren Sezener

Fantine Huot

Matthew Lamm

Nicola De Cao

Charlie Chen

Gamaleldin Elsayed

Ed Huai-hsin Chi

Mahdis Mahdieh

Ian F. Tenney

Nan Hua

Ivan Petrychenko

Patrick Kane

Dylan Scandinaro

Rishub Jain

Jonathan Uesato

Romina Datta

Adam Sadovsky

Oskar Bunyan

Dominik Rabiej

Shimu Wu

John Zhang

Gautam Vasudevan

Edouard Leurent

Mahmoud Alnahlawi

Ionut-Razvan Georgescu

Nan Wei

Ivy Zheng

Betty Chan

Pam G Rabinovitch

Piotr Stańczyk

Ye Zhang

David Steiner

Subhajit Naskar

Michael Azzam

Matthew Johnson

Adam Paszke

Chung-Cheng Chiu

Jaume Sanchez Elias

Afroz Mohiuddin

Faizan Muhammad

Jin Miao

Andrew Lee

Nino Vieillard

Sahitya Potluri

Jane Park

Elnaz Davoodi

Jiageng Zhang

Jeff Stanway

Drew Garmon

Abhijit Karmarkar

Zhe Dong

2023-12-18

ArXiv (preprint)

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Avi Singh

John D Co-Reyes

Rishabh Agarwal

Piyush Patil

Xavier Garcia

Peter J. Liu

James Harrison

Jaehoon Lee

Kelvin Xu

Aaron T Parisi

Abhishek Kumar

A. Alemi

Alex Rizkowsky

Azade Nova

Ben Adlam

Bernd Bohnet

Hanie Sedghi

Gamaleldin Fathy Elsayed

Igor Mordatch … (see 21 more)

Isabelle Simpson

Izzeddin Gur

Jasper Snoek

Jeffrey Pennington

Jiri Hron

Kathleen Kenealy

Kevin Swersky

Kshiteej Mahajan

Laura Culp

Lechao Xiao

Maxwell Bileschi

Noah Constant

Roman Novak

Rosanne Liu

Tris Brian Warkentin

Yundi Qian

Ethan Dyer

Behnam Neyshabur

Jascha Sohl-Dickstein

Yamini Bansal

Noah Fiedel

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often lim… (see more)ited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST

2023-11-30

arXiv (published)

Pretraining Reward-Free Representations for Data-Efficient Reinforcement Learning

Philip Bachman

2021-03-08

International Conference on Learning Representations (unknown)

DATA-EFFICIENT REINFORCEMENT LEARNING

R Devon Hjelm

Philip Bachman

Aaron Courville

Data efficiency poses a major challenge for deep reinforcement learning. We approach this issue from the perspective of self-supervised repr… (see more)esentation learning, leveraging reward-free exploratory data to pretrain encoder networks. We employ a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning. We demonstrate that our method scales well with network capacity and pretraining data. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning, and approaches human-level performance.

2020-12-31

(published)

www.semanticscholar.org

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Max Schwarzer

Rishab Goel

R Devon Hjelm

Aaron Courville

Philip Bachman

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interacti… (see more)on with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations(SPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari in a setting limited to 100k steps of environment interaction, which represents a 55% relative improvement over the previous state-of-the-art. Notably, even in this limited data regime, SPR exceeds expert human scores on 7 out of 26 games. The code associated with this work is available at https://github.com/mila-iqia/spr

2020-12-31

ICLR (published)

Pretraining Representations for Data-Efficient Reinforcement Learning

Philip Bachman

Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder w… (see more)hich is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting. We provide code associated with this work at https://github.com/mila-iqia/SGI.

2020-12-31

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (published)

Unsupervised State Representation Learning in Atari

Evan Racah

R Devon Hjelm

State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent ag… (see more)ents that can perform a wide variety of tasks. Learning such representations without supervision from rewards is a challenging open problem. We introduce a method that learns state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations. We also introduce a new benchmark based on Atari 2600 games where we evaluate representations based on how well they capture the ground truth state variables. We believe this new framework for evaluating representation learning models will be crucial for future representation learning research. Finally, we compare our technique with other state-of-the-art generative and contrastive representation learning methods. The code associated with this work is available at this https URL

2018-12-31

Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (published)

Blindfold Baselines for Embodied QA

We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question b… (see more)y intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.

2018-11-11

ArXiv (preprint)

Home: A Household Multimodal Environment

Simon Brodeur

Luca Celotti

Jean Rouat

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction… (see more) with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.

2017-12-31

ICLR (Workshop) (published)