Diganta Misra

Challenging Common Assumptions about Catastrophic Forgetting and Knowledge Accumulation

Timothee LESORT

Oleksiy Ostapenko

Pau Rodriguez

Diganta Misra

Md Rifat Arefin

Laurent Charlin

Irina Rish

2023-11-20

Proceedings of The 2nd Conference on Lifelong Learning Agents (publié)

proceedings.mlr.press

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava

Abhinav Rastogi

Abhishek Rao

Abu Awal Md Shoeb

Abubakar Abid

Adam Fisch

Adam R. Brown

Adam Santoro

Aditya Gupta

Adrià Garriga-Alonso

Agnieszka Kluska

Aitor Lewkowycz

Akshat Agarwal

Alethea Power

Alex Ray

Alex Warstadt

Alexander W. Kocurek

Ali Safaya

Ali Tazarv

Alice Xiang … (voir 432 de plus)

Alicia Parrish

Allen Nie

Aman Hussain

Amanda Askell

Amanda Dsouza

Ambrose Slone

Ameet Rahane

Anantharaman S. Iyer

Anders Johan Andreassen

Andrea Madotto

Andrea Santilli

Andreas Stuhlmüller

Andrew M. Dai

Andrew La

Andrew Lampinen

Andy Zou

Angela Jiang

Angelica Chen

Anh Vuong

Animesh Gupta

Anna Gottardi

Antonio Norelli

Anu Venkatesh

Arash Gholamidavoodi

Arfa Tabassum

Arul Menezes

Arun Kirubarajan

Asher Mullokandov

Ashish Sabharwal

Austin Herrick

Avia Efrat

Aykut Erdem

Ayla Karakaş

B. Ryan Roberts

Bao Sheng Loe

Barret Zoph

Bartłomiej Bojanowski

Batuhan Özyurt

Behnam Hedayatnia

Behnam Neyshabur

Benjamin Inden

Benno Stein

Berk Ekmekci

Bill Yuchen Lin

Blake Howald

Bryan Orinion

Cameron Diao

Cameron Dour

Catherine Stinson

Cedrick Argueta

Cesar Ferri

Chandan Singh

Charles Rathkopf

Chenlin Meng

Chitta Baral

Chiyu Wu

Chris Callison-Burch

Christopher Waites

Christian Voigt

Christopher D Manning

Christopher Potts

Cindy Ramirez

Clara E. Rivera

Clemencia Siro

Colin Raffel

Courtney Ashcraft

Cristina Garbacea

Damien Sileo

Dan Garrette

Dan Hendrycks

Dan Kilman

Dan Roth

C. Daniel Freeman

Daniel Khashabi

Daniel Levy

Daniel Moseguí González

Danielle Perszyk

Danny Hernandez

Danqi Chen

Daphne Ippolito

Dar Gilboa

David Dohan

David Drakard

David Jurgens

Debajyoti Datta

Deep Ganguli

Denis Emelin

Denis Kleyko

Deniz Yuret

Derek Chen

Derek Tam

Dieuwke Hupkes

Diganta Misra

Dilyar Buzan

Dimitri Coelho Mollo

Diyi Yang

Dong-Ho Lee

Dylan Schrader

Ekaterina Shutova

Ekin Dogus Cubuk

Elad Segal

Eleanor Hagerman

Elizabeth Barnes

Elizabeth Donoway

Ellie Pavlick

Emanuele Rodolà

Emma Lam

Eric Chu

Eric Tang

Erkut Erdem

Ernie Chang

Ethan A Chi

Ethan Dyer

Ethan Jerzak

Ethan Kim

Eunice Engefu Manyasi

Evgenii Zheltonozhskii

Fanyue Xia

Fatemeh Siar

Fernando Martínez-Plumed

Francesca Happé

Francois Chollet

Frieda Rong

Gaurav Mishra

Genta Indra Winata

Gerard de Melo

Germán Kruszewski

Giambattista Parascandolo

Giorgio Mariani

Gloria Xinyue Wang

Gonzalo Jaimovitch-Lopez

Gregor Betz

Guy Gur-Ari

Hana Galijasevic

Hannah Kim

Hannah Rashkin

Hannaneh Hajishirzi

Harsh Mehta

Hayden Bogar

Henry Francis Anthony Shevlin

Hinrich Schuetze

Hiromu Yakura

Hongming Zhang

Hugh Mee Wong

Ian Ng

Isaac Noble

Jaap Jumelet

Jack Geissinger

Jackson Kernion

Jacob Hilton

Jaehoon Lee

Jaime Fernández Fisac

James B Simon

James Koppel

James Zheng

James Zou

Jan Kocon

Jana Thompson

Janelle Wingfield

Jared Kaplan

Jarema Radom

Jascha Sohl-Dickstein

Jason Phang

Jason Wei

Jason Yosinski

Jekaterina Novikova

Jelle Bosscher

Jennifer Marsh

Jeremy Kim

Jeroen Taal

Jesse Engel

Jesujoba Oluwadara Alabi

Jiacheng Xu

Jiaming Song

Jillian Tang

Joan Waweru

John Burden

John Miller

John U. Balis

Jonathan Batchelder

Jonathan Berant

Jörg Frohberg

Jos Rozen

Jose Hernandez-Orallo

Joseph Boudeman

Joseph Guerr

Joseph Jones

Joshua B. Tenenbaum

Joshua S. Rule

Joyce Chua

Joyce Hui Ping Chua

Kamil Kanclerz

Karen Livescu

Karl Krauth

Karthik Gopalakrishnan

Katerina Ignatyeva

Katja Markert

Kaustubh Dhole

Kevin Gimpel

Kevin Omondi

Kory Wallace Mathewson

Kristen Chiafullo

Ksenia Shkaruta

Kumar Shridhar

Kyle McDonell

Kyle Richardson

Laria Reynolds

Leo Gao

Li Zhang

Liam Dugan

Lianhui Qin

Lidia Contreras-Ochando

Louis-Philippe Morency

Luca Moschella

Lucas Lam

Lucy Noble

Ludwig Schmidt

Luheng He

Luis Oliveros-Colón

Luke Metz

Lütfi Kerem Senel

Maarten Bosma

Maarten Sap

Maartje Ter Hoeve

Maheen Farooqi

Manaal Faruqui

Mantas Mazeika

Marco Baturan

Marco Marelli

Marco Maru

Maria Jose Ramirez-Quintana

Marie Tolkiehn

Mario Giulianelli

Martha Lewis

Martin Potthast

Matthew L Leavitt

Matthias Hagen

Mátyás Schubert

Medina Orduna Baitemirova

Melody Arnaud

Melvin McElrath

Michael Andrew Yee

Michael Cohen

Michael Gu

Michael Ivanitskiy

Michael Starritt

Michael Strube

Michał Swędrowski

Michele Bevilacqua

Michihiro Yasunaga

Mihir Kale

Mike Cain

Mimee Xu

Mirac Suzgun

Mitch Walker

Mo Tiwari

Mohit Bansal

Moin Aminnaseri

Mor Geva

Mozhdeh Gheini

Mukund Varma T

Nanyun Peng

Nathan Andrew Chi

Nayeon Lee

Neta Gur-Ari Krakover

Nicholas Cameron

Nicholas Roberts

Nick Doiron

Nicole Martinez

Nikita Nangia

Niklas Deckers

Niklas Muennighoff

Nitish Shirish Keskar

Niveditha S. Iyer

Noah Constant

Noah Fiedel

Nuan Wen

Oliver Zhang

Omar Agha

Omar Elbaghdadi

Omer Levy

Owain Evans

Pablo Antonio Moreno Casares

Parth Doshi

Pascale Fung

Paul Pu Liang

Paul Vicol

Pegah Alipoormolabashi

Peiyuan Liao

Percy Liang

Peter W Chang

Peter Eckersley

Phu Mon Htut

Pinyu Hwang

Pi-Bei Hwang

Piotr Miłkowski

Piyush Patil

Pouya Pezeshkpour

Priti Oli

Qiaozhu Mei

Qing Lyu

Qinlang Chen

Rabin Banjade

Rachel Etta Rudolph

Raefer Gabriel

Rahel Habacker

Ramon Risco

Raphaël Millière

Rhythm Garg

Richard Barnes

Rif A. Saurous

Riku Arakawa

Robbe Raymaekers

Robert Frank

Rohan Sikand

Roman Novak

Roman Sitelew

Ronan Le Bras

Rosanne Liu

Rowan Jacobs

Rui Zhang

Russ Salakhutdinov

Ryan Andrew Chi

Seungjae Ryan Lee

Ryan Stovall

Ryan Teehan

Rylan Yang

Sahib Singh

Saif Mohammad

Sajant Anand

Sam Dillavou

Sam Shleifer

Sam Wiseman

Samuel Gruetter

Samuel R. Bowman

Samuel Stern Schoenholz

Sanghyun Han

Sanjeev Kwatra

Sarah A. Rous

Sarik Ghazarian

Sayan Ghosh

Sean Casey

Sebastian Bischoff

Sebastian Gehrmann

Sebastian Schuster

Sepideh Sadeghi

Shadi Hamdan

Sharon Zhou

Shashank Srivastava

Sherry Shi

Shikhar Singh

Shima Asaadi

Shixiang Shane Gu

Shubh Pachchigar

Shubham Toshniwal

Shyam Upadhyay

Shyamolima Shammie Debnath

Siamak Shakeri

Simon Thormeyer

Simone Melzi

Siva Reddy

Sneha Priscilla Makini

Soo-Hwan Lee

Spencer Torene

Sriharsha Hatwar

Stanislas Dehaene

Stefan Divic

Stefano Ermon

Stella Biderman

Stephanie Lin

Stephen Prasad

Steven Piantadosi

Stuart Shieber

Summer Misherghi

Svetlana Kiritchenko

Swaroop Mishra

Tal Linzen

Tal Schuster

Tao Li

Tao Yu

Tariq Ali

Tatsunori Hashimoto

Te-Lin Wu

Théo Desbordes

Theodore Rothschild

Thomas Phan

Tianle Wang

Tiberius Nkinyili

Timo Schick

Timofei Kornev

Titus Tunduny

Tobias Gerstenberg

Trenton Chang

Trishala Neeraj

Tushar Khot

Tyler Shultz

Uri Shaham

Vedant Misra

Vera Demberg

Victoria Nyamai

Vikas Raunak

Vinay Venkatesh Ramasesh

vinay uday prabhu

Vishakh Padmakumar

Vivek Srikumar

William Fedus

William Saunders

William Zhang

Wout Vossen

Xiang Ren

Xiaoyu Tong

Xinran Zhao

Xinyi Wu

Xudong Shen

Yadollah Yaghoobzadeh

Yair Lakretz

Yangqiu Song

Yasaman Bahri

Yejin Choi

Yichi Yang

Yiding Hao

Yifu Chen

Yonatan Belinkov

Yu Hou

Yufang Hou

Yuntao Bai

Zachary Seid

Zhuoye Zhao

Zijian Wang

Zijie J. Wang

Zirui Wang

Ziyi Wu

Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially … (voir plus)transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG- bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood develop- ment, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google- internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

2023-05-11

TMLR (accepté)

openreview.net

APP: Anytime Progressive Pruning

Diganta Misra

Bharat Runwal

Tianlong Chen

Zhangyang Wang

Irina Rish

With the latest advances in deep learning, several methods have been investigated for optimal learning settings in scenarios where the data … (voir plus)stream is continuous over time. However, training sparse networks in such settings has often been overlooked. In this paper, we explore the problem of training a neural network with a target sparsity in a particular case of online learning: the anytime learning at macroscale paradigm (ALMA). We propose a novel way of progressive pruning, referred to as \textit{Anytime Progressive Pruning} (APP); the proposed approach significantly outperforms the baseline dense and Anytime OSP models across multiple architectures and datasets under short, moderate, and long-sequence training. Our method, for example, shows an improvement in accuracy of

2022-11-17

ACML.org/2022/Workshop/CLL (publié)

doi.org

openreview.net

Challenging Common Assumptions about Catastrophic Forgetting

Timothee LESORT

Oleksiy Ostapenko

Pau Rodriguez

Md Rifat Arefin

Diganta Misra

Laurent Charlin

Irina Rish

Building learning agents that can progressively learn and accumulate knowledge is the core goal of the continual learning (CL) research fiel… (voir plus)d. Unfortunately, training a model on new data usually compromises the performance on past data. In the CL literature, this effect is referred to as catastrophic forgetting (CF). CF has been largely studied, and a plethora of methods have been proposed to address it on short sequences of non-overlapping tasks. In such setups, CF always leads to a quick and significant drop in performance in past tasks. Nevertheless, despite CF, recent work showed that SGD training on linear models accumulates knowledge in a CL regression setup. This phenomenon becomes especially visible when tasks reoccur. We might then wonder if DNNs trained with SGD or any standard gradient-based optimization accumulate knowledge in such a way. Such phenomena would have interesting consequences for applying DNNs to real continual scenarios. Indeed, standard gradient-based optimization methods are significantly less computationally expensive than existing CL algorithms. In this paper, we study the progressive knowledge accumulation (KA) in DNNs trained with gradient-based algorithms in long sequences of tasks with data re-occurrence. We propose a new framework, SCoLe (Scaling Continual Learning), to investigate KA and discover that catastrophic forgetting has a limited effect on DNNs trained with SGD. When trained on long sequences with data sparsely re-occurring, the overall accuracy improves, which might be counter-intuitive given the CF phenomenon. We empirically investigate KA in DNNs under various data occurrence frequencies and propose simple and scalable strategies to increase knowledge accumulation in DNNs.

2022-07-10

ArXiv (prépublication)

openreview.net

Scaling the Number of Tasks in Continual Learning

Timothee LESORT

Oleksiy Ostapenko

Diganta Misra

Md Rifat Arefin

Pau Rodriguez

Laurent Charlin

Irina Rish

2022-01-01

arXiv.org (prépublication)

doi.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Diganta Misra

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Diganta Misra

Publications