Portrait of Nicolas Chapados

Nicolas Chapados

Associate Industry Member
Adjunct Professor, Polytechnique Montréal, Department of Applied Mathematics
Vice-President, Research, ServiceNow Research
Research Topics
Deep Learning

Biography

Nicolas Chapados is VP of research at ServiceNow Inc. He holds an engineering degree from McGill University and a PhD in computer science from Université de Montréal. In 2021, while still writing his thesis, Chapados and his advisor Yoshua Bengio co-founded ApSTAT Technologies, a machine learning technology transfer firm that applies cutting-edge academic research ideas to areas like insurance risk evaluation, supply chain planning, business forecasting, biotechnology and hedge fund management. He then went on to co-found a number of spin-off companies: Imagia, which focuses on the AI analysis of medical images to detect and quantify cancer early; Element AI, which was acquired by ServiceNow in January 2021; and Chapados Couture Capital, a quantitative asset manager. Chapados’ research interests include time series modelling, natural language processing and optimal decision-making. He holds the Chartered Financial Analyst (CFA) designation.

Publications

StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben allal
Federico Cassano
Joel Lamy-Poirier
Nouamane Tazi
Ao Tang
Dmytro Pykhtar
Jiawei Liu
Yuxiang Wei
Tianyang Liu
Max Tian
Denis Kocetkov
Arthur Zucker
Younes Belkada
Zijian Wang
Qian Liu
Dmitry Abulkhanov
Indraneil Paul
Zhuang Li … (see 46 more)
Wen-Ding Li
Megan L. Risdal
Jia LI
Jian Zhu
Terry Yue Zhuo
Evgenii Zheltonozhskii
Nii Osae Osae Dade
Wenhao Yu
Lucas Krauss
Naman Jain
Yixuan Su
Xuanli He
Manan Dey
Edoardo Abati
Yekun Chai
Niklas Muennighoff
Xiangru Tang
Muhtasham Oblokulov
Christopher Akiki
Marc Marone
Chenghao Mou
Mayank Mishra
Alex Gu
Binyuan Hui
Tri Dao
Armel Zebaze
Olivier Dehaene
Nicolas Patry
Canwen Xu
Julian McAuley
Han Hu
Torsten Scholak
Sebastien Paquet
Jennifer Robinson
Carolyn Jane Anderson
Md. Mostofa Ali Patwary
Nima Tajbakhsh
Yacine Jernite
Carlos Muñoz Ferrandis
Lingming Zhang
Sean Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), … (see more)introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series
Arjun Ashok
Étienne Marcotte
Valentina Zantedeschi
We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including fore… (see more)casting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series. Code is made available at https://github.com/ServiceNow/TACTiS.
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Jo˜ao Monteiro
Étienne Marcotte
Pierre-Andre Noel
Valentina Zantedeschi
David Vazquez
Perouz Taslakian
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference informati… (see more)on. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the encoder-decoder architecture, use cross-attention to condition generation on reference text without the prompt. More precisely, we leverage pre-trained decoder-only models and only train a small number of added layers. We use Question-Answering (QA) as a testbed to evaluate the ability of our models to perform conditional generation and observe that they outperform ICL, are comparable to fine-tuned prompted LLMs, and drastically reduce the space footprint relative to standard KV caching by two orders of magnitude.
Capture the Flag: Uncovering Data Insights with Large Language Models
Issam Hadj Laradji
Perouz Taslakian
Sai Rajeswar
Valentina Zantedeschi
Alexandre Lacoste
David Vazquez
The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. Howev… (see more)er, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasoning and code generation techniques. We propose a new evaluation methodology based on a"capture the flag"principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset. We further propose two proof-of-concept agents, with different inner workings, and compare their ability to capture such flags in a real-world sales dataset. While the work reported here is preliminary, our results are sufficiently interesting to mandate future exploration by the community.
Capture the Flag: Uncovering Data Insights with Large Language Models
Issam Hadj Laradji
Perouz Taslakian
Sai Rajeswar
Valentina Zantedeschi
Alexandre Lacoste
David Vazquez
The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. Howev… (see more)er, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasoning and code generation techniques. We propose a new evaluation methodology based on a"capture the flag"principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset. We further propose two proof-of-concept agents, with different inner workings, and compare their ability to capture such flags in a real-world sales dataset. While the work reported here is preliminary, our results are sufficiently interesting to mandate future exploration by the community.
Capture the Flag: Uncovering Data Insights with Large Language Models
Issam Hadj Laradji
Perouz Taslakian
Sai Rajeswar
Valentina Zantedeschi
Alexandre Lacoste
David Vazquez
The Unsolved Challenges of LLMs as Generalist Web Agents: A Case Study
Rim Assouel
Tom Marty
Massimo Caccia
Issam Hadj Laradji
Sai Rajeswar
Hector Palacios
David Vazquez
Alexandre Lacoste