Siva Reddy

aristides.milios@mila.quebec

Arkil Patel

PhD - McGill University

Principal supervisor :

arkil.patel@mila.quebec

PhD - McGill University

benno.krojer@mila.quebec

gaurav.kamath@mila.quebec

Gaurav Kamath

PhD - McGill University

Karolina Ewa Stańczak

Postdoctorate - McGill University

karolina.stanczak@mila.quebec

PhD - McGill University

laurestine.bradford@mila.quebec

Marius Mosbach

Postdoctorate - McGill University

marius.mosbach@mila.quebec

nicholas.meade@mila.quebec

Nicholas Meade

PhD - McGill University

Github

parishad.behnamghader@mila.quebec

Parishad BehnamGhader

Master's Research - McGill University

Collaborating researcher

spandana.gella@mila.quebec

vaibhav.adlakha@mila.quebec

Vaibhav Adlakha

PhD - McGill University

Xing Han Lu

PhD - McGill University

PhD - None

zdenek.kasner@mila.quebec

Zichao Li

PhD - McGill University

Co-supervisor :

Jackie Cheung

zichao.li@mila.quebec

Publications

StarCoder: may the source be with you!

Raymond Li

Loubna Ben allal

Yangtian Zi

Niklas Muennighoff

Denis Kocetkov

Chenghao Mou

Marc Marone

Christopher Akiki

Jia LI

Jenny Chim

Qian Liu

Evgenii Zheltonozhskii

Terry Yue Zhuo

Thomas Wang

Olivier Dehaene

Mishig Davaadorj

Joel Lamy-Poirier

Joao Monteiro

Oleh Shliazhko

Nicolas Gontier … (see 49 more)

Nicholas Meade

Armel Zebaze

Ming-Ho Yee

Logesh Kumar Umapathi

Jian Zhu

Ben Lipkin

Muhtasham Oblokulov

Zhiruo Wang

Rudra Murthy

Jason T Stillerman

Siva Sankalp Patel

Dmitry Abulkhanov

Marco Zocca

Manan Dey

Zhihan Zhang

N. Fahmy

Urvashi Bhattacharyya

Wenhao Yu

Swayam Singh

Sasha Luccioni

Paulo Villegas

Jan Ebert

M. Kunakov

Fedor Zhdanov

Manuel Romero

Tony Lee

Nadav Timor

Jennifer Ding

Claire S Schlesinger

Hailey Schoelkopf

Jana Ebert

Tri Dao

Mayank Mishra

Alex Gu

Jennifer Robinson

Sean Hughes

Carolyn Jane Anderson

Brendan Dolan-Gavitt

Danish Contractor

Daniel Fried

Yacine Jernite

Carlos Muñoz Ferrandis

Sean M. Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

Harm de Vries

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs)… (see more), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

2023-12-17

TMLR (accepted)

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model

Parishad BehnamGhader

Santiago Miret

Augmenting pretrained language models with retrievers to select the supporting documents has shown promise in effectively solving common NLP… (see more) problems, including language modeling and question answering, in an interpretable way. In this paper, we first study the strengths and weaknesses of different retriever-augmented language models (REALM,

2023-12-01

Findings of the Association for Computational Linguistics: EMNLP 2023 (published)

Evaluating In-Context Learning of Libraries for Code Generation

Arkil Patel

Pradeep Dasigi

2023-11-16

ArXiv (preprint)

arxiv.org

Using In-Context Learning to Improve Dialogue Safety

Nicholas Meade

Spandana Gella

Devamanyu Hazarika

Prakhar Gupta

Di Jin

Yang Liu

Dilek Hakkani-Tur

2023-10-07

EMNLP/2023/Conference (published)

Are Diffusion Models Vision-And-Language Reasoners?

Benno Krojer

Elinor Poole-Dayan

Vikram Voleti

Chris Pal

Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlik… (see more)e discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon.

The Impact of Positional Encoding on Length Generalization in Transformers

Amirhossein Kazemnejad

Inkit Padhi

Karthikeyan Natesan

K. Ramamurthy

Payel Das

Length generalization, the ability to generalize from small training context sizes to larger ones, is a critical challenge in the developmen… (see more)t of Transformer-based language models. Positional encoding (PE) has been identified as a major factor influencing length generalization, but the exact impact of different PE schemes on extrapolation in downstream tasks remains unclear. In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5's Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE). Our evaluation encompasses a battery of reasoning and mathematical tasks. Our findings reveal that the most commonly used positional encoding methods, such as ALiBi, Rotary, and APE, are not well suited for length generalization in downstream tasks. More importantly, NoPE outperforms other explicit positional encoding methods while requiring no additional computation. We theoretically demonstrate that NoPE can represent both absolute and relative PEs, but when trained with SGD, it mostly resembles T5's relative PE attention patterns. Finally, we find that scratchpad is not always helpful to solve length generalization and its format highly impacts the model's performance. Overall, our work suggests that explicit position embeddings are not essential for decoder-only Transformers to generalize well to longer sequences.

In-Context Learning for Text Classification with Many Labels

Aristides Milios

2023-09-19

ArXiv (preprint)

arxiv.org

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

Vaibhav Adlakha

Parishad BehnamGhader

Xing Han Lu

Nicholas Meade

Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as … (see more)question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided knowledge (faithfulness). Guided by human evaluation and analysis, we highlight the shortcomings of traditional metrics for both correctness and faithfulness. We then propose simple token-overlap based and model-based metrics that reflect the true performance of these models. Our analysis reveals that instruction-following models are competitive, and sometimes even outperform fine-tuned models for correctness. However, these models struggle to stick to the provided knowledge and often hallucinate in their responses. We hope our work encourages a more holistic evaluation of instruction-following models for QA. Our code and data is available at https://github.com/McGill-NLP/instruct-qa

2023-07-31

ArXiv (preprint)

arxiv.org

ROSA: Random Orthogonal Subspace Adaptation

Marawan Gamal

Guillaume Rabusseau

Aristides Milios

2023-06-20

ICML.cc/2023/Workshop/ES-FoMO (poster)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava

Abhinav Rastogi

Abhishek Rao

Abu Awal Md Shoeb

Abubakar Abid

Adam Fisch

Adam R. Brown

Adam Santoro

Aditya Gupta

Adrià Garriga-Alonso

Agnieszka Kluska

Aitor Lewkowycz

Akshat Agarwal

Alethea Power

Alex Ray

Alex Warstadt

Alexander W. Kocurek

Ali Safaya

Ali Tazarv

Alice Xiang … (see 432 more)

Alicia Parrish

Allen Nie

Aman Hussain

Amanda Askell

Amanda Dsouza

Ambrose Slone

Ameet Rahane

Anantharaman S. Iyer

Anders Johan Andreassen

Andrea Madotto

Andrea Santilli

Andreas Stuhlmüller

Andrew M. Dai

Andrew La

Andrew Lampinen

Andy Zou

Angela Jiang

Angelica Chen

Anh Vuong

Animesh Gupta

Anna Gottardi

Antonio Norelli

Anu Venkatesh

Arash Gholamidavoodi

Arfa Tabassum

Arul Menezes

Arun Kirubarajan

Asher Mullokandov

Ashish Sabharwal

Austin Herrick

Avia Efrat

Aykut Erdem

Ayla Karakaş

B. Ryan Roberts

Bao Sheng Loe

Barret Zoph

Bartłomiej Bojanowski

Batuhan Özyurt

Behnam Hedayatnia

Behnam Neyshabur

Benjamin Inden

Benno Stein

Berk Ekmekci

Bill Yuchen Lin

Blake Howald

Bryan Orinion

Cameron Diao

Cameron Dour

Catherine Stinson

Cedrick Argueta

Cesar Ferri

Chandan Singh

Charles Rathkopf

Chenlin Meng

Chitta Baral

Chiyu Wu

Chris Callison-Burch

Christopher Waites

Christian Voigt

Christopher D Manning

Christopher Potts

Cindy Ramirez

Clara E. Rivera

Clemencia Siro

Colin Raffel

Courtney Ashcraft

Cristina Garbacea

Damien Sileo

Dan Garrette

Dan Hendrycks

Dan Kilman

Dan Roth

C. Daniel Freeman

Daniel Khashabi

Daniel Levy

Daniel Moseguí González

Danielle Perszyk

Danny Hernandez

Danqi Chen

Daphne Ippolito

Dar Gilboa

David Dohan

David Drakard

David Jurgens

Debajyoti Datta

Deep Ganguli

Denis Emelin

Denis Kleyko

Deniz Yuret

Derek Chen

Derek Tam

Dieuwke Hupkes

Diganta Misra

Dilyar Buzan

Dimitri Coelho Mollo

Diyi Yang

Dong-Ho Lee

Dylan Schrader

Ekaterina Shutova

Ekin Dogus Cubuk

Elad Segal

Eleanor Hagerman

Elizabeth Barnes

Elizabeth Donoway

Ellie Pavlick

Emanuele Rodolà

Emma Lam

Eric Chu

Eric Tang

Erkut Erdem

Ernie Chang

Ethan A Chi

Ethan Dyer

Ethan Jerzak

Ethan Kim

Eunice Engefu Manyasi

Evgenii Zheltonozhskii

Fanyue Xia

Fatemeh Siar

Fernando Martínez-Plumed

Francesca Happé

Francois Chollet

Frieda Rong

Gaurav Mishra

Genta Indra Winata

Gerard de Melo

Germán Kruszewski

Giambattista Parascandolo

Giorgio Mariani

Gloria Xinyue Wang

Gonzalo Jaimovitch-Lopez

Gregor Betz

Guy Gur-Ari

Hana Galijasevic

Hannah Kim

Hannah Rashkin

Hannaneh Hajishirzi

Harsh Mehta

Hayden Bogar

Henry Francis Anthony Shevlin

Hinrich Schuetze

Hiromu Yakura

Hongming Zhang

Hugh Mee Wong

Ian Ng

Isaac Noble

Jaap Jumelet

Jack Geissinger

Jackson Kernion

Jacob Hilton

Jaehoon Lee

Jaime Fernández Fisac

James B Simon

James Koppel

James Zheng

James Zou

Jan Kocon

Jana Thompson

Janelle Wingfield

Jared Kaplan

Jarema Radom

Jascha Sohl-Dickstein

Jason Phang

Jason Wei

Jason Yosinski

Jekaterina Novikova

Jelle Bosscher

Jennifer Marsh

Jeremy Kim

Jeroen Taal

Jesse Engel

Jesujoba Oluwadara Alabi

Jiacheng Xu

Jiaming Song

Jillian Tang

Joan Waweru

John Burden

John Miller

John U. Balis

Jonathan Batchelder

Jonathan Berant

Jörg Frohberg

Jos Rozen

Jose Hernandez-Orallo

Joseph Boudeman

Joseph Guerr

Joseph Jones

Joshua B. Tenenbaum

Joshua S. Rule

Joyce Chua

Joyce Hui Ping Chua

Kamil Kanclerz

Karen Livescu

Karl Krauth

Karthik Gopalakrishnan

Katerina Ignatyeva

Katja Markert

Kaustubh Dhole

Kevin Gimpel

Kevin Omondi

Kory Wallace Mathewson

Kristen Chiafullo

Ksenia Shkaruta

Kumar Shridhar

Kyle McDonell

Kyle Richardson

Laria Reynolds

Leo Gao

Li Zhang

Liam Dugan

Lianhui Qin

Lidia Contreras-Ochando

Louis-Philippe Morency

Luca Moschella

Lucas Lam

Lucy Noble

Ludwig Schmidt

Luheng He

Luis Oliveros-Colón

Luke Metz

Lütfi Kerem Senel

Maarten Bosma

Maarten Sap

Maartje Ter Hoeve

Maheen Farooqi

Manaal Faruqui

Mantas Mazeika

Marco Baturan

Marco Marelli

Marco Maru

Maria Jose Ramirez-Quintana

Marie Tolkiehn

Mario Giulianelli

Martha Lewis

Martin Potthast

Matthew L Leavitt

Matthias Hagen

Mátyás Schubert

Medina Orduna Baitemirova

Melody Arnaud

Melvin McElrath

Michael Andrew Yee

Michael Cohen

Michael Gu

Michael Ivanitskiy

Michael Starritt

Michael Strube

Michał Swędrowski

Michele Bevilacqua

Michihiro Yasunaga

Mihir Kale

Mike Cain

Mimee Xu

Mirac Suzgun

Mitch Walker

Mo Tiwari

Mohit Bansal

Moin Aminnaseri

Mor Geva

Mozhdeh Gheini

Mukund Varma T

Nanyun Peng

Nathan Andrew Chi

Nayeon Lee

Neta Gur-Ari Krakover

Nicholas Cameron

Nicholas Roberts

Nick Doiron

Nicole Martinez

Nikita Nangia

Niklas Deckers

Niklas Muennighoff

Nitish Shirish Keskar

Niveditha S. Iyer

Noah Constant

Noah Fiedel

Nuan Wen

Oliver Zhang

Omar Agha

Omar Elbaghdadi

Omer Levy

Owain Evans

Pablo Antonio Moreno Casares

Parth Doshi

Pascale Fung

Paul Pu Liang

Paul Vicol

Pegah Alipoormolabashi

Peiyuan Liao

Percy Liang

Peter W Chang

Peter Eckersley

Phu Mon Htut

Pinyu Hwang

Pi-Bei Hwang

Piotr Miłkowski

Piyush Patil

Pouya Pezeshkpour

Priti Oli

Qiaozhu Mei

Qing Lyu

Qinlang Chen

Rabin Banjade

Rachel Etta Rudolph

Raefer Gabriel

Rahel Habacker

Ramon Risco

Raphaël Millière

Rhythm Garg

Richard Barnes

Rif A. Saurous

Riku Arakawa

Robbe Raymaekers

Robert Frank

Rohan Sikand

Roman Novak

Roman Sitelew

Ronan Le Bras

Rosanne Liu

Rowan Jacobs

Rui Zhang

Russ Salakhutdinov

Ryan Andrew Chi

Seungjae Ryan Lee

Ryan Stovall

Ryan Teehan

Rylan Yang

Sahib Singh

Saif Mohammad

Sajant Anand

Sam Dillavou

Sam Shleifer

Sam Wiseman

Samuel Gruetter

Samuel R. Bowman

Samuel Stern Schoenholz

Sanghyun Han

Sanjeev Kwatra

Sarah A. Rous

Sarik Ghazarian

Sayan Ghosh

Sean Casey

Sebastian Bischoff

Sebastian Gehrmann

Sebastian Schuster

Sepideh Sadeghi

Shadi Hamdan

Sharon Zhou

Shashank Srivastava

Sherry Shi

Shikhar Singh

Shima Asaadi

Shixiang Shane Gu

Shubh Pachchigar

Shubham Toshniwal

Shyam Upadhyay

Shyamolima Shammie Debnath

Siamak Shakeri

Simon Thormeyer

Simone Melzi

Sneha Priscilla Makini

Soo-Hwan Lee

Spencer Torene

Sriharsha Hatwar

Stanislas Dehaene

Stefan Divic

Stefano Ermon

Stella Biderman

Stephanie Lin

Stephen Prasad

Steven Piantadosi

Stuart Shieber

Summer Misherghi

Svetlana Kiritchenko

Swaroop Mishra

Tal Linzen

Tal Schuster

Tao Li

Tao Yu

Tariq Ali

Tatsunori Hashimoto

Te-Lin Wu

Théo Desbordes

Theodore Rothschild

Thomas Phan

Tianle Wang

Tiberius Nkinyili

Timo Schick

Timofei Kornev

Titus Tunduny

Tobias Gerstenberg

Trenton Chang

Trishala Neeraj

Tushar Khot

Tyler Shultz

Uri Shaham

Vedant Misra

Vera Demberg

Victoria Nyamai

Vikas Raunak

Vinay Venkatesh Ramasesh

vinay uday prabhu

Vishakh Padmakumar

Vivek Srikumar

William Fedus

William Saunders

William Zhang

Wout Vossen

Xiang Ren

Xiaoyu Tong

Xinran Zhao

Xinyi Wu

Xudong Shen

Yadollah Yaghoobzadeh

Yair Lakretz

Yangqiu Song

Yasaman Bahri

Yejin Choi

Yichi Yang

Yiding Hao

Yifu Chen

Yonatan Belinkov

Yu Hou

Yufang Hou

Yuntao Bai

Zachary Seid

Zhuoye Zhao

Zijian Wang

Zijie J. Wang

Zirui Wang

Ziyi Wu

Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially … (see more)transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG- bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood develop- ment, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google- internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

2023-05-11

TMLR (accepted)

Combining Parameter-efficient Modules for Task-level Generalisation

Edoardo Ponti

Alessandro Sordoni

Yoshua Bengio

2023-05-01

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (published)

Evaluating Dependencies in Fact Editing for Language Models: Specificity and Implication Awareness

Zichao Li

Ines Arous

Jackie Cheung

The potential of using a large language model (LLM) as a knowledge base (KB) has sparked significant interest. To maintain the knowledge acq… (see more)uired by LLMs, we need to ensure that the editing of learned facts respects internal logical constraints, which are known as dependency of knowledge. Existing work on editing LLMs has partially addressed the issue of dependency, when the editing of a fact should apply to its lexical variations without disrupting irrelevant ones. However, they neglect the dependency between a fact and its logical implications. We propose an evaluation protocol with an accompanying question-answering dataset, StandUp, that provides a comprehensive assessment of the editing process considering the above notions of dependency. Our protocol involves setting up a controlled environment in which we edit facts and monitor their impact on LLMs, along with their implications based on If-Then rules. Extensive experiments on StandUp show that existing knowledge editing methods are sensitive to the surface form of knowledge, and that they have limited performance in inferring the implications of edited facts.

2023-01-01

EMNLP (Findings) (published)