Raymond Li

Rebekkah Hogan

Robin Battey

Rocky Wang

Rohan Mah-eswari

Russell Howes

Ruty Rinott

Sai Jayesh

Bondu Samyak

Sara Datta

Sara Chugh

Sargun Hunt

Sasha Dhillon

Satadru Sidorov

Saurabh Pan

Verma Seiji

Sharadh Yamamoto

Shaun Ramaswamy

Sheng Lind-say

Sheng Feng

Shengxin Cindy Lin

Shiva Zha

Shuqiang Shankar

Sinong Zhang

Wang Sneha

Soji Agarwal

Soumith Sajuyigbe

Chintala Stephanie

Stephen Max

Steve Chen

Steve Kehoe

Sudarshan Satterfield

S. Govindaprasad

Gupta Sung-Bae

Sunny Cho

Suraj Virk

Subramanian Sy

Sy Choudhury

Tal Goldman

T. Remez

Tamara Glaser

Thilo Best

Thomas Kohler

Tianhe Robinson

Tianjun Li

Tim Zhang

Tim Matthews

Tzook Chou

Varun Shaked

Victoria Vontimitta

Victoria Ajayi

Vijai Montanez

Vinay Satish Mohan

Vishal Kumar

Vlad Mangla

Ionescu

Vlad Andrei

V. Poenaru

Vlad T. Mihailescu

Wei Ivanov

Wenchen Li

Wen-wen Wang

Wes Jiang

Bouaziz

Yilin Zhang

Ying Zhang

Yossi Adi

Youngjin Nam

Yu Wang

Yuchen Hao

Yundi Qian

Yuzi He

Zach Rait

Zachary DeVito

Zef Rosnbrick

Zhaoduo Wen

Zhenyu Yang

Zhiwei Zhao. 2024

The Llama

Gemma Team

Thomas Mesnard

Cassidy Hardin

Robert Dadashi

Surya Bhupatiraju

Shreya Pathak

L. Sifre

Morgane Rivière

Mihir Kale

Pouya Christo-pher Love

Dehghani Tafti

L'eonard Hussenot

Aakanksha Chowdhery

Adam Roberts

Aditya Barua

Alex Botev

Alex Castro-Ros

Ambrose Slone

Amélie Héliou

A. Tacchetti

Anna Bulanova

Antonia Paterson

Beth Tsai

Bobak Shahriari

Le Lan

Christopher A. Choquette-Choo

Clé-ment Crepy

Daniel Matthew Cer

Daphne Ippolito

David Reid

Elena Buchatskaya

Eric Ni

Eric Noland

Geng Yan

George Tucker

George-Christian Muraru

Grigory Rozhdestvenskiy

Henryk Michalewski

Ian Ten-ney

Ivan Grishchenko

Jacob Austin

James Keel-ing

Jane Labanowski

Jean-Baptiste Lespiau

Jeff Stanway

Jenny Brennan

Jeremy Chen

Johan Fer-ret

Justin Chiu

Justin Mao-jones

Kather-ine Lee

Kathy Yu

Katie Millican

Lars Lowe Sjoesund

Lisa Lee

Lucas Dixon

Machel Reid

Maciej Mikuła

Mateo Wirth

Michael Sharman

Nikolai Chinaev

Nithum Thain

Olivier Bachem

Oscar Chang

O. Wahltinez

Paige Bailey

Paul Michel

Petko Yotov Pier

Giuseppe Sessa

Rahma Chaabouni

Ramona Comanescu

Reena Jana

Rohan Anil

2025-04-01

Findings of the Association for Computational Linguistics: NAACL 2025 (published)

Evaluating Numeracy of Language Models as a Natural Language Inference Task

Rahmad Mahendra

Damiano Spina

Lawrence Cavedon

Karin Verspoor

Zhangir Azerbayev

Hailey Schoelkopf

Keiran Paster

Marco Dos Santos

Stephen Marcus McAleer

Al-bert Q. Jiang

Jia Deng

Stella Biderman

Sean Welleck. 2024

Llemma

Taylor Berg-Kirkpatrick

Daniel Spokoyny. 2020

Samuel R. Bowman

Gabor Angeli

Christopher Potts

Christopher D. Manning. 2015 … (see 480 more)

Tom Brown

Benjamin Mann

Nick Ryder

Melanie Subbiah

Jared Kaplan

Prafulla Dhariwal

Arvind Neelakantan

Pranav Shyam

Girish Sastry

Amanda Askell

Sandhini Agarwal

Ariel Herbert-Voss

Gretchen Krueger

T. Henighan

Rewon Child

Aditya Ramesh

Daniel M. Ziegler

Jeffrey Wu

Clemens Winter

Chris Hesse

Mark Chen

Eric Sigler

Ma-teusz Litwin

Scott Gray

Benjamin Chess

J. Clark

Christopher Berner

Sam McCandlish

Alec Radford

Ilya Sutskever

Dario Amodei. 2020

Samuel Cahyawijaya

Holy Lovenia

Alham Fikri Aji

Genta Indra Winata

Bryan Wilie

Fajri Koto

Christian Wibisono

Ade Romadhony

Karissa Vincentio

Jennifer Santoso

David Moel-jadi

Cahya Wirawan

Frederikus Hudi

Muham-mad Satrio Wicaksono

Ivan Halim Parmonangan

Ika Al-fina

Ilham Firdausi Putra

Samsul Rahmadani

Yulianti Oenang

Ali Akbar Septiandri

James Jaya

Kaustubh Dhole

Arie Suryani

Rifki Afina

Dan Putri

Keith Su

Made Nindyatama Stevens

Muhammad Nityasya

Ryan Adilazuarda

R. Hadiwijaya

Diandaru Tiezheng

Vito Yu

Wenliang Ghifari

Yan Dai

Xu Dyah

Haryo Damapuspita

Cuk Wibowo

Ich-wanul Tho

Karo Karo

T. Fatyanosa

Ziwei Ji

Graham Neubig

Timothy Baldwin

Zheng Cai

Maosong Cao

Haojiong Chen

Kai Chen

Keyu Chen

Xin Chen

Xun Chen

Ze-yu Chen

Zhi Chen

Pei Chu

Xiaoyi Dong

Haodong Duan

Qi Fan

Zhaoye Fei

Yan Gao

Jiaye Ge

Chenya Gu

Yuzhe Gu

Tao Gui

Aijia Guo

Qipeng Guo

Conghui He

Yingfan Hu

Ting Huang

T. Jiang

Penglong Jiao

Hongwei Liu

Jiangning Liu

Jiawei Hong

Kaiwen Liu

Kuikun Liu

Xiaoran Liu

Chen Lv

Haijun Lv

Kai Lv 0001

Li Ma

Runyuan Ma

Zerun Ma

Wenchang Ning

Linke Ouyang

Jiantao Qiu

Yuan Qu

Fukai Shang

Yunfan Shao

Hyung Won

Le Hou

Shayne Longpre

Barret Zoph

Yi Tay

William Fedus

Yunxuan Li

Xuezhi Wang

Mostafa Dehghani

Siddhartha Brahma

Alex Webson

Shixiang Shane

Zhuyun Gu

Menghua Dai

Xinyun Suzgun

Aakanksha Chen

Alex Chowdhery

Marie Castro-Ros

Kevin Pellat

Dasha Robinson

Sharan Valter

Gaurav Narang

Adams Mishra

Y. YuVincent

Yanping Zhao

Andrew Huang

Dai

Kevin Clark

Minh-Thang Luong

Quoc V. Le

Christopher D. Manning. 2020

Electra

Karl Cobbe

Vineet Kosaraju

Mo Bavarian

Heewoo Jun

Lukasz Kaiser

Matthias Plappert

Jerry Tworek

Jacob Hilton

Reiichiro Nakano

Xiao Bi

Deli Chen

Guanting Chen

Shanhuang Chen

Damai Dai

Cheng Deng

Honghui Ding

Kai Dong

Qiushi Du

Zhe Fu

Huazuo Gao

Kaige Gao

Wenjun Gao

Ruiqi Ge

Kang Guan

Daya Guo

Jianzhong Guo

Guangbo Hao

Zhewen Hao

Ying He

Panpan Wenjie Hu

Didem Foss

Dingkang Wang

Duc Le

Dustin Hol-land

Edward Dowling

Eissa Jamil

Elaine Mont-gomery

Eleonora Presani

Emily Hahn

Emily Wood

Erik Brinkman

Esteban Arcaute

Evan Dunbar

Evan Smothers

Fei Sun

Felix Kreuk

Feng Tian

Firat Ozgenel

Francesco Caggioni

F. Guzm’an

Frank J. Kanayet

Frank Seide

Gabriela Medina Florez

Gabriella Schwarz

Gada Badeer

Georgia Swee

Gil Halpern

G. Thattai

Grant Herman

G. Sizov

Guangyi Zhang

Guna Lakshmi-narayanan

Hamid Shojanazeri

Han Zou

Hannah Wang

Han Zha

Haroun Habeeb

Harrison Rudolph

Helen Suk

Henry Aspegren

Hunter Goldman

Igor Molybog

Igor Tufanov

Irina-Elena Veliche

Itai Gat

Jake Weissman

James Geboski

James Kohli

Japhet Asher

Jean-Baptiste Gaya

Jeff Marcus

Jeff Tang

Jennifer Chan

Jenny Zhen

Jeremy Reizen-stein

J. Teboul

Jessica Zhong

Jian Jin

Jingyi Yang

Joe Cummings

Jon Carvill

Jon Shepard

J. McPhie

Jonathan Torres

Josh Ginsburg

Junjie Wang

Kai Wu

U. KamHou

Karan Saxena

Karthik Prasad

Kartikay Khandelwal

Katayoun Zand

Kathy Matosich

Kaushik Veeraragha-van

Kelly Michelena

Keqian Li

Kun Huang

Kushal Chawla

Kushal Lakhotia

Kyle Huang

Lailin Chen

Lakshya Garg

A. Lavender

Leandro Silva

Lee Bell

Lei Zhang

Liangpeng Guo

Licheng Yu

Liron Moshkovich

Luca Wehrstedt

Madian Khabsa

Manav Avalani

Manish Bhatt

Maria Tsim-poukelli

Martynas Mankus

Matan Hasson

Matthias Lennie

Matthias Reso

Maxim Groshev

Maxim Naumov

Maya Lathi

Meghan Keneally

Michal Seltzer

Michal Valko

Michelle Restrepo

Mihir Patel

Mik Vyatskov

Mikayel Samvelyan

Mike Clark

Mike Macey

Mike Wang

Miquel Jubert

Mo Metanat

Mohammad Rastegari

Munish Bansal

Nandhini Santhanam

Natascha Parks

Natasha White

Navyata Bawa

Nayan Singhal

Nick Egebo

Nicolas Usunier

Nikolay Pavlovich

Laptev Ning

Ning Dong

Norman Zhang

Oleg Cheng

Olivia Chernoguz

Omkar Hart

Ozlem Salpekar

Parkin Kalinli

Parth Kent

Paul Parekh

Pa-van Saab

Pedro Balaji

Philip Rittner

Pierre Bontrager

Piotr Roux

Polina Dollár

P. Zvyagina

Pritish Yuvraj

Qian Liang

Rachad Alao

Rachel Rodriguez

Rafi Ayub

Raghotham Murthy

Raghu Nayani

Rahul Mitra

Rebekkah Hogan

Robin Battey

Rocky Wang

Rohan Mah-eswari

Russell Howes

Ruty Rinott

Sai Jayesh

Bondu Samyak

Sara Datta

Sara Chugh

Sargun Hunt

Sasha Dhillon

Satadru Sidorov

Saurabh Pan

Verma Seiji

Sharadh Yamamoto

Shaun Ramaswamy

Sheng Lind-say

Sheng Feng

Shengxin Cindy Lin

Shiva Zha

Shuqiang Shankar

Sinong Zhang

Wang Sneha

Soji Agarwal

Soumith Sajuyigbe

Chintala Stephanie

Stephen Max

Steve Chen

Steve Kehoe

Sudarshan Satterfield

S. Govindaprasad

Gupta Sung-Bae

Sunny Cho

Suraj Virk

Subramanian Sy

Sy Choudhury

Tal Goldman

T. Remez

Tamara Glaser

Thilo Best

Thomas Kohler

Tianhe Robinson

Tianjun Li

Tim Zhang

Tim Matthews

Tzook Chou

Varun Shaked

Victoria Vontimitta

Victoria Ajayi

Vijai Montanez

Vinay Satish Mohan

Vishal Kumar

Vlad Mangla

Ionescu

Vlad Andrei

V. Poenaru

Vlad T. Mihailescu

Wei Ivanov

Wenchen Li

Wen-wen Wang

Wes Jiang

Bouaziz

Yilin Zhang

Ying Zhang

Yossi Adi

Youngjin Nam

Yu Wang

Yuchen Hao

Yundi Qian

Yuzi He

Zach Rait

Zachary DeVito

Zef Rosnbrick

Zhaoduo Wen

Zhenyu Yang

Zhiwei Zhao. 2024

The Llama

Gemma Team

Thomas Mesnard

Cassidy Hardin

Robert Dadashi

Surya Bhupatiraju

Shreya Pathak

L. Sifre

Morgane Rivière

Mihir Kale

Pouya Christo-pher Love

Dehghani Tafti

L'eonard Hussenot

Aakanksha Chowdhery

Adam Roberts

Aditya Barua

Alex Botev

Alex Castro-Ros

Ambrose Slone

Amélie Héliou

A. Tacchetti

Anna Bulanova

Antonia Paterson

Beth Tsai

Bobak Shahriari

Le Lan

Christopher A. Choquette-Choo

Clé-ment Crepy

Daniel Matthew Cer

Daphne Ippolito

David Reid

Elena Buchatskaya

Eric Ni

Eric Noland

Geng Yan

George Tucker

George-Christian Muraru

Grigory Rozhdestvenskiy

Henryk Michalewski

Ian Ten-ney

Ivan Grishchenko

Jacob Austin

James Keel-ing

Jane Labanowski

Jean-Baptiste Lespiau

Jeff Stanway

Jenny Brennan

Jeremy Chen

Johan Fer-ret

Justin Chiu

Justin Mao-jones

Kather-ine Lee

Kathy Yu

Katie Millican

Lars Lowe Sjoesund

Lisa Lee

Lucas Dixon

Machel Reid

Maciej Mikuła

Mateo Wirth

Michael Sharman

Nikolai Chinaev

Nithum Thain

Olivier Bachem

Oscar Chang

O. Wahltinez

Paige Bailey

Paul Michel

Petko Yotov Pier

Giuseppe Sessa

Rahma Chaabouni

Ramona Comanescu

Reena Jana

Rohan Anil

While recent advancements in large language models (LLMs) have enhanced their capabilities to solve mathematical problems, other aspects of … (see more)numeracy remain underexplored. In this paper, we propose a benchmark to evaluate the ability of language models to perform basic numeracy tasks. We frame numeracy as a Natural Language Inference (NLI) task to assess the models’ ability to understand both numbers and language contexts. We evaluate 49 language models (LMs), including fine-tuned LMs on NLI datasets, instruction-tuned LLMs, and specialized math-LLMs. Our findings reveal three main insights: (1) LLMs only clearly outperform smaller LMs in arithmetic tasks, indicating that mathematical reasoning cannot be generalized to other numeracy skills such as number comparison and normalization; (2) while most language models achieve fair to good accuracy for NLI entailment cases, they still struggle to predict contradiction and neutral cases; and (3) the robustness of language models’ numeracy capabilities needs improvement, particularly in understanding the semantics and pragmatics of numbers in linguistic contexts.

2025-01-01

North American Chapter of the Association for Computational Linguistics (published)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Md. Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Md. Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Md. Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Md. Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Md. Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Md. Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Md. Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov

Loubna Ben allal

Federico Cassano

Joel Lamy-Poirier

Nouamane Tazi

Ao Tang

Dmytro Pykhtar

Jiawei Liu

Yuxiang Wei

Tianyang Liu

Max Tian

Denis Kocetkov

Arthur Zucker

Younes Belkada

Zijian Wang

Qian Liu

Dmitry Abulkhanov

Indraneil Paul

Zhuang Li … (see 46 more)

Wen-Ding Li

Megan L. Risdal

Jia LI

Jian Zhu

Terry Yue Zhuo

Evgenii Zheltonozhskii

Nii Osae Osae Dade

Wenhao Yu

Lucas Krauss

Naman Jain

Yixuan Su

Xuanli He

Edoardo Abati

Yekun Chai

Niklas Muennighoff

Xiangru Tang

Muhtasham Oblokulov

Christopher Akiki

Marc Marone

Chenghao Mou

Mayank Mishra

Alex Gu

Binyuan Hui

Tri Dao

Armel Zebaze

Olivier Dehaene

Nicolas Patry

Canwen Xu

Julian McAuley

Han Hu

Torsten Scholak

Sebastien Paquet

Jennifer Robinson

Carolyn Jane Anderson

Md. Mostofa Ali Patwary

Nima Tajbakhsh

Yacine Jernite

Carlos Muñoz Ferrandis

Lingming Zhang

Sean Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.

2024-02-29

ArXiv (preprint)

StarCoder: may the source be with you!

Loubna Ben allal

Yangtian Zi

Niklas Muennighoff

Denis Kocetkov

Chenghao Mou

Marc Marone

Christopher Akiki

Jia LI

Jenny Chim

Qian Liu

Evgenii Zheltonozhskii

Terry Yue Zhuo

Thomas Wang

Olivier Dehaene

Mishig Davaadorj

Joel Lamy-Poirier

Joao Monteiro

Oleh Shliazhko

Nicolas Gontier … (see 49 more)

Nicholas Meade

Armel Zebaze

Ming-Ho Yee

Logesh Kumar Umapathi

Jian Zhu

Ben Lipkin

Muhtasham Oblokulov

Zhiruo Wang

Rudra Murthy

Jason T Stillerman

Siva Sankalp Patel

Dmitry Abulkhanov

Marco Zocca

Zhihan Zhang

N. Fahmy

Urvashi Bhattacharyya

Wenhao Yu

Swayam Singh

Sasha Luccioni

Paulo Villegas

M. Kunakov

Jan Ebert

Fedor Zhdanov

Manuel Romero

Tony Lee

Nadav Timor

Jennifer Ding

Claire S Schlesinger

Hailey Schoelkopf

Jana Ebert

Tri Dao

Mayank Mishra

Alex Gu

Jennifer Robinson

Sean Hughes

Carolyn Jane Anderson

Brendan Dolan-Gavitt

Danish Contractor

Siva Reddy

Daniel Fried

Dzmitry Bahdanau

Yacine Jernite

Carlos Muñoz Ferrandis

Sean M. Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs)… (see more), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

2023-12-17

TMLR (accepted)