Publications

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

Jo˜ao Monteiro

Étienne Marcotte

Pierre-Andre Noel

Valentina Zantedeschi

In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference informati… (see more)on. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the encoder-decoder architecture, use cross-attention to condition generation on reference text without the prompt. More precisely, we leverage pre-trained decoder-only models and only train a small number of added layers. We use Question-Answering (QA) as a testbed to evaluate the ability of our models to perform conditional generation and observe that they outperform ICL, are comparable to fine-tuned prompted LLMs, and drastically reduce the space footprint relative to standard KV caching by two orders of magnitude.

2024-01-01

EMNLP (Findings) (published)

doi.org

arxiv.org

Zero-shot Logical Query Reasoning on any Knowledge Graph

Mikhail Galkin

Jincheng Zhou

Bruno Ribeiro

Jian Tang

Zhaocheng Zhu

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional querie… (see more)s comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, an inductive reasoning model that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG even if it is only finetuned on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 14 of them.

2024-01-01

NeurIPS (published)

doi.org

arxiv.org

Penalties and Rewards for Fair Learning in Paired Kidney Exchange Programs

Margarida Carvalho

Alison Caulfield

Yi Lin

Adrian Vetta

A kidney exchange program, also called a kidney paired donation program, can be viewed as a repeated, dynamic trading and allocation mechani… (see more)sm. This suggests that a dynamic algorithm for transplant exchange selection may have superior performance in comparison to the repeated use of a static algorithm. We confirm this hypothesis using a full scale simulation of the Canadian Kidney Paired Donation Program: learning algorithms, that attempt to learn optimal patient-donor weights in advance via dynamic simulations, do lead to improved outcomes. Specifically, our learning algorithms, designed with the objective of fairness (that is, equity in terms of transplant accessibility across cPRA groups), also lead to an increased number of transplants and shorter average waiting times. Indeed, our highest performing learning algorithm improves egalitarian fairness by 10% whilst also increasing the number of transplants by 6% and decreasing waiting times by 24%. However, our main result is much more surprising. We find that the most critical factor in determining the performance of a kidney exchange program is not the judicious assignment of positive weights (rewards) to patient-donor pairs. Rather, the key factor in increasing the number of transplants, decreasing waiting times and improving group fairness is the judicious assignment of a negative weight (penalty) to the small number of non-directed donors in the kidney exchange program.

2023-12-31

Web and Internet Economics (published)

doi.org

arxiv.org

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Stephen Casper

Xander Davies

Claudia Shi

Thomas Krendl Gilbert

Jérémy Scheurer

Javier Rando

Rachel Freedman

Tomasz Korbak

David Lindner

Pedro Freire

Tony Tong Wang

Samuel Marks

Charbel-Raphael Segerie

Micah Carroll

Andi Peng

Phillip Christoffersen

Mehul Damani

Stewart Slocum

Usman Anwar

Anand Siththaranjan … (see 12 more)

Max Nadeau

Eric J Michaud

Jacob Pfau

Dmitrii Krasheninnikov

Xin Chen

Lauro Langosco

Peter Hase

Erdem Biyik

Anca Dragan

David Scott Krueger

Dorsa Sadigh

Dylan Hadfield-Menell

2023-12-30

TMLR (accepted)

doi.org

openreview.net

Latent Idiom Recognition for a Minimalist Functional Array Language Using Equality Saturation

Jonathan Van der Cruysse

Christophe Dubach

Accelerating programs is typically done by recognizing code idioms matching high-performance libraries or hardware interfaces. However, reco… (see more)gnizing such idioms automatically is challenging. The idiom recognition machinery is difficult to write and requires expert knowledge. In addition, slight variations in the input program might hide the idiom and defeat the recognizer. This paper advocates for the use of a minimalist functional array language supporting a small, but expressive, set of operators. The minimalist design leads to a tiny sets of rewrite rules, which encode the language semantics. Crucially, the same minimalist language is also used to encode idioms. This removes the need for hand-crafted analysis passes, or for having to learn a complex domain-specific language to define the idioms. Coupled with equality saturation, this approach is able to match the core functions from the BLAS and PyTorch libraries on a set of computational kernels. Compared to reference C kernel implementations, the approach produces a geometric mean speedup of 1.46× for C programs using BLAS, when generating such programs from the high-level minimalist language.

2023-12-29

ArXiv (preprint)

doi.org

arxiv.org

Performance reserves in brain-imaging-based phenotype prediction

Marc-Andre Schulz

Danilo Bzdok

Stefan Haufe

John-Dylan Haynes

Kerstin Ritter

Machine learning studies have shown that various phenotypes can be predicted from structural and functional brain images. However, in most s… (see more)uch studies, prediction performance ranged from moderate to disappointing. It is unclear whether prediction performance will substantially improve with larger sample sizes or whether insufficient predictive information in brain images impedes further progress. Here, we systematically assess the effect of sample size on prediction performance using sample sizes far beyond what is possible in common neuroimaging studies. We project 3-9 fold improvements in prediction performance for behavioral and mental health phenotypes when moving from one thousand to one million samples. Moreover, we find that moving from single imaging modalities to multimodal input data can lead to further improvements in prediction performance, often on par with doubling the sample size. Our analyses reveal considerable performance reserves for neuroimaging-based phenotype prediction. Machine learning models may benefit much more from extremely large neuroimaging datasets than currently believed.

2023-12-29

Cell reports (published)

doi.org

Use of Artificial Intelligence in the Identification and Management of Frailty: A Scoping Review Protocol

Sathya Karunananthan

Arya Rahgozar

Ramtin Hakimjavadi

Hui Yan

Kunal A Dalsania

Howard Bergman

Bishwajit Ghose

Jim LaPlante

Tess McCutcheon

Daniel I McIsaac

Samira Abbasgholizadeh-Rahimi

Nadia Sourial

Manpreet Thandi

Sabrina T Wong

Clare Liddy

2023-12-28

BMJ Open (published)

doi.org

Behavioural pseudometrics for continuous-time diffusions

Linan Chen

Florence Clerc

Prakash Panangaden

2023-12-27

ArXiv (preprint)

doi.org

arxiv.org

Device-Free Human State Estimation using UWB Multi-Static Radios

Saria Al Lahham

Bobak H. Baghi

Pierre-Yves Lajoie

Amal Feriani

Sachini Herath

Steve Liu

Gregory Dudek

We present a human state estimation framework that allows us to estimate the location, and even the activities, of people in an indoor envir… (see more)onment without the requirement that they carry a specific devices with them. To achieve this"device free"localization we use a small number of low-cost Ultra-Wide Band (UWB) sensors distributed across the environment of interest. To achieve high quality estimation from the UWB signals merely reflected of people in the environment, we exploit a deep network that can learn to make inferences. The hardware setup consists of commercial off-the-shelf (COTS) single antenna UWB modules for sensing, paired with Raspberry PI units for computational processing and data transfer. We make use of the channel impulse response (CIR) measurements from the UWB sensors to estimate the human state - comprised of location and activity - in a given area. Additionally, we can also estimate the number of humans that occupy this region of interest. In our approach, first, we pre-process the CIR data which involves meticulous aggregation of measurements and extraction of key statistics. Afterwards, we leverage a convolutional deep neural network to map the CIRs into precise location estimates with sub-30 cm accuracy. Similarly, we achieve accurate human activity recognition and occupancy counting results. We show that we can quickly fine-tune our model for new out-of-distribution users, a process that requires only a few minutes of data and a few epochs of training. Our results show that UWB is a promising solution for adaptable smart-home localization and activity recognition problems.

2023-12-26

ArXiv (preprint)

doi.org

arxiv.org

Fairness-Aware Structured Pruning in Transformers

Abdelrahman Zayed

Goncalo Mordido

Samira Shabanian

Ioana Baldini

Sarath Chandar

2023-12-24

ArXiv (preprint)

doi.org

arxiv.org

Fairness-Aware Structured Pruning in Transformers

Abdelrahman Zayed

Goncalo Mordido

Samira Shabanian

Ioana Baldini

Sarath Chandar

The increasing size of large language models (LLMs) has introduced challenges in their training and inference. Removing model components is … (see more)perceived as a solution to tackle the large model sizes, however, existing pruning methods solely focus on performance, without considering an essential aspect for the responsible use of LLMs: model fairness. It is crucial to address the fairness of LLMs towards diverse groups, such as women, Black people, LGBTQ+, Jewish communities, among others, as they are being deployed and available to a wide audience. In this work, first, we investigate how attention heads impact fairness and performance in pre-trained transformer-based language models. We then propose a novel method to prune the attention heads that negatively impact fairness while retaining the heads critical for performance, i.e. language modeling capabilities. Our approach is practical in terms of time and resources, as it does not require fine-tuning the final pruned, and fairer, model. Our findings demonstrate a reduction in gender bias by 19%, 19.5%, 39.5%, 34.7%, 23%, and 8% for DistilGPT-2, GPT-2, GPT-Neo of two different sizes, GPT-J, and Llama 2 models, respectively, in comparison to the biased model, with only a slight decrease in performance. WARNING: This work uses language that is offensive in nature.

2023-12-24

ArXiv (preprint)

doi.org

arxiv.org

Neural manifolds and learning regimes in neural-interface tasks

Alexandre Payeur

Amy L. Orsborn

Guillaume Lajoie

2023-12-23

bioRxiv (preprint)

doi.org