Crowdkeeping in Last-mile Delivery
Xin Wang
Okan Arslan
Crowdkeeping in Last-Mile Delivery
Xin Wang
Okan Arslan
In order to improve the efficiency of the last-mile delivery system when customers are possibly absent for deliveries, we propose the idea o… (see more)f employing the crowd to work as keepers and to provide storage services for their neighbors. Crowd keepers have extra flexibility, more availability, and lower costs than fixed storage options such as automated lockers, and this leads to a more efficient and a more profitable system for last-mile deliveries. We present a bilevel program that jointly determines the assignment, routing, and pricing decisions while considering customer preferences, keeper behaviors, and platform operations. We develop an equivalent single-level program, a mixed-integer linear program with subtour elimination constraints, that can be solved to optimality using a row generation algorithm. To improve the efficiency of the solution procedure, we further derive exact best response sets for both customers and keepers, and approximate optimal travel times using linear regression. We present a numerical study using a real-world data set from Amazon. The fixed-storage and no-storage systems are used as benchmarks to assess the performance of the crowdkeeping system. The results show that the crowdkeeping delivery system has the potential to generate higher profits because of its ability to consolidate deliveries and to eliminate failed deliveries. Funding: Funding provided by the Natural Sciences and Engineering Research Council of Canada [Grants 2022-04979 and 2022-05261], the Canada Research Chair program [Grant CRC-2018-00105], and the China Scholarship Council [Grant 202006190051] is gratefully acknowledged. Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2022.0323 .
Disentangling the Causes of Plasticity Loss in Neural Networks
Clare Lyle
Zeyu Zheng
Hado van Hasselt
James Martens
Will Dabney
Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption… (see more): that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is violated, e.g.\ deep reinforcement learning, learning algorithms become unstable and brittle with respect to hyperparameters and even random seeds. One factor driving this instability is the loss of plasticity, meaning that updating the network's predictions in response to new information becomes more difficult as training progresses. While many recent works provide analyses and partial solutions to this phenomenon, a fundamental question remains unanswered: to what extent do known mechanisms of plasticity loss overlap, and how can mitigation strategies be combined to best maintain the trainability of a network? This paper addresses these questions, showing that loss of plasticity can be decomposed into multiple independent mechanisms and that, while intervening on any single mechanism is insufficient to avoid the loss of plasticity in all cases, intervening on multiple mechanisms in conjunction results in highly robust learning algorithms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks, and further demonstrate its effectiveness on naturally arising nonstationarities, including reinforcement learning in the Arcade Learning Environment.
Disentangling the Causes of Plasticity Loss in Neural Networks
Clare Lyle
Zeyu Zheng
Hado van Hasselt
James Martens
Will Dabney
Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption… (see more): that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is violated, e.g.\ deep reinforcement learning, learning algorithms become unstable and brittle with respect to hyperparameters and even random seeds. One factor driving this instability is the loss of plasticity, meaning that updating the network's predictions in response to new information becomes more difficult as training progresses. While many recent works provide analyses and partial solutions to this phenomenon, a fundamental question remains unanswered: to what extent do known mechanisms of plasticity loss overlap, and how can mitigation strategies be combined to best maintain the trainability of a network? This paper addresses these questions, showing that loss of plasticity can be decomposed into multiple independent mechanisms and that, while intervening on any single mechanism is insufficient to avoid the loss of plasticity in all cases, intervening on multiple mechanisms in conjunction results in highly robust learning algorithms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks, and further demonstrate its effectiveness on naturally arising nonstationarities, including reinforcement learning in the Arcade Learning Environment.
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George Cristian-Muraru
Albert Gu
Ruba Haroun
Leonard Berrada
Yutian Chen 0001
Srivatsan Srinivasan
Guillaume Desjardins
Arnaud Doucet
David Mark Budden
Yee Whye Teh
Nando de Freitas
Caglar Gulcehre
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George Cristian-Muraru
Albert Gu
Ruba Haroun
Leonard Berrada
Yutian Chen 0001
Srivatsan Srinivasan
Guillaume Desjardins
Arnaud Doucet
David Mark Budden
Yee Whye Teh
Nando de Freitas
Caglar Gulcehre
Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to sc… (see more)ale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama-2 despite being trained on over 6 times fewer tokens. We also show that Griffin can extrapolate on sequences significantly longer than those seen during training. Our models match the hardware efficiency of Transformers during training, and during inference they have lower latency and significantly higher throughput. We scale Griffin up to 14B parameters, and explain how to shard our models for efficient distributed training.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.