Nicolas Chapados

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

Jo˜ao Monteiro

Étienne Marcotte

Pierre-Andre Noel

Valentina Zantedeschi

In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference informati… (see more)on. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the encoder-decoder architecture, use cross-attention to condition generation on reference text without the prompt. More precisely, we leverage pre-trained decoder-only models and only train a small number of added layers. We use Question-Answering (QA) as a testbed to evaluate the ability of our models to perform conditional generation and observe that they outperform ICL, are comparable to fine-tuned prompted LLMs, and drastically reduce the space footprint relative to standard KV caching by two orders of magnitude.

2024-01-01

EMNLP (Findings) (published)

Capture the Flag: Uncovering Data Insights with Large Language Models

Issam Hadj Laradji

Perouz Taslakian

Sai Rajeswar

Valentina Zantedeschi

Alexandre Lacoste

The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. Howev… (see more)er, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasoning and code generation techniques. We propose a new evaluation methodology based on a"capture the flag"principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset. We further propose two proof-of-concept agents, with different inner workings, and compare their ability to capture such flags in a real-world sales dataset. While the work reported here is preliminary, our results are sufficiently interesting to mandate future exploration by the community.

2023-12-21

ArXiv (preprint)

Capture the Flag: Uncovering Data Insights with Large Language Models

Issam Hadj Laradji

Perouz Taslakian

Sai Rajeswar

Valentina Zantedeschi

Alexandre Lacoste

The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. Howev… (see more)er, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasoning and code generation techniques. We propose a new evaluation methodology based on a"capture the flag"principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset. We further propose two proof-of-concept agents, with different inner workings, and compare their ability to capture such flags in a real-world sales dataset. While the work reported here is preliminary, our results are sufficiently interesting to mandate future exploration by the community.

2023-12-21

ArXiv (preprint)

Capture the Flag: Uncovering Data Insights with Large Language Models

Issam Hadj Laradji

Perouz Taslakian

Sai Rajeswar

Valentina Zantedeschi

Alexandre Lacoste

2023-11-07

NeurIPS.cc/2023/Workshop/FMDM (published)

The Unsolved Challenges of LLMs as Generalist Web Agents: A Case Study

Rim Assouel

Tom Marty

Massimo Caccia

Issam Hadj Laradji

Sai Rajeswar

Hector Palacios

Quentin Cappart

David Vázquez

Maxime Gasse

Alexandre Lacoste

2023-11-07

NeurIPS.cc/2023/Workshop/FMDM (published)

Lag-Llama: Towards Foundation Models for Time Series Forecasting

Kashif Rasul

Andrew Robert Williams

Arian Khorasani

George Adamopoulos

Rishika Bhagwatkar

Marin Biloš

Hena Ghonia

Nadhir Hassen

Anderson Schneider

Sahil Garg

Yuriy Nevmyvaka

Irina Rish

Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-… (see more)Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.

2023-11-01

NeurIPS.cc/2023/Workshop/R0-FoMo (poster)

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Kashif Rasul

Andrew Robert Williams

Arian Khorasani

George Adamopoulos

Rishika Bhagwatkar

Marin Bilovs

Hena Ghonia

Nadhir Hassen

Anderson Schneider

Sahil Garg

Yuriy Nevmyvaka

Irina Rish

2023-10-12

ArXiv (preprint)

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Kashif Rasul

Andrew Robert Williams

Arian Khorasani

George Adamopoulos

Rishika Bhagwatkar

Marin Bilovs

Hena Ghonia

N. Hassen

Anderson Schneider

Sahil Garg

Yuriy Nevmyvaka

Irina Rish

Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-sho… (see more)t and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data.

2023-10-12

ArXiv (preprint)

TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series

Étienne Marcotte

Valentina Zantedeschi

We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including fore… (see more)casting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series. Code is made available at https://github.com/ServiceNow/TACTiS.

2023-10-02

ArXiv (preprint)

Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts

Étienne Marcotte

Valentina Zantedeschi

Multivariate probabilistic time series forecasts are commonly evaluated via proper scoring rules, i.e., functions that are minimal in expect… (see more)ation for the ground-truth distribution. However, this property is not sufficient to guarantee good discrimination in the non-asymptotic regime. In this paper, we provide the first systematic finite-sample study of proper scoring rules for time-series forecasting evaluation. Through a power analysis, we identify the"region of reliability"of a scoring rule, i.e., the set of practical conditions where it can be relied on to identify forecasting errors. We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions, and we gauge the generalizability of our findings to real-world tasks with an application to an electricity production problem. Our results reveal critical shortcomings in the evaluation of multivariate probabilistic forecasts as commonly performed in the literature.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

Dynamic Routing and Wavelength Assignment with Reinforcement Learning

Peyman Kafaei

Quentin Cappart

Hamed Pouya

Louis-Martin Rousseau

With the rapid developments in communication systems, and considering their dynamic nature, all-optical networks are becoming increasingly c… (see more)omplex. This study proposes a novel method based on deep reinforcement learning for the routing and wavelength assignment problem in all-optical wavelength-decision-multiplexing networks. We consider dynamic incoming requests, in which their arrival and holding times are not known in advance. The objective is to devise a strategy that minimizes the number of rejected packages due to the lack of resources in the long term. We use graph neural networks to capture crucial latent information from the graph-structured input to develop the optimal strategy. The proposed deep reinforcement learning algorithm selects a route and a wavelength simultaneously for each incoming traffic connection as they arrive. The results demonstrate that the learned agent outperforms the methods used in practice and can be generalized on network topologies that did not participate in training.

2023-06-08

INFORMS Journal on Optimization (published)