Publications

scSemiProfiler: Advancing Large-scale Single-cell Studies through Semi-profiling with Deep Generative Models and Active Learning

Jingtao Wang

Gregory Fonseca

Jun Ding

2024-07-16

Nature Communications (published)

doi.org

Group Membership Bias

Ali Vardasbi

Maarten de Rijke

Fernando Diaz

Mostafa Dehghani

2024-07-11

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (published)

doi.org

arxiv.org

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Eugene Belilovsky

Irina Rish

2024-07-08

TMLR (accepted)

doi.org

openreview.net

LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

Ayush Kaushal

Tejas Vaidhya

Irina Rish

Low Rank Decomposition of matrix - splitting a large matrix into a product of two smaller matrix offers a means for compression that reduces… (see more) the parameters of a model without sparsification, and hence delivering more speedup on modern hardware. Moreover, unlike quantization, the compressed linear layers remain fully differentiable and all the parameters trainable, while being able to leverage the existing highly efficient kernels over floating point matrices. We study the potential to compress Large Language Models (LLMs) for monolingual Code generation via Low Rank Decomposition (LoRD) and observe that ranks for the linear layers in these models can be reduced by upto 39.58% with less than 1% increase in perplexity. We then use Low Rank Decomposition (LoRD) to compress StarCoder 16B to 13.2B parameter with no drop and to 12.3B with minimal drop in HumanEval Pass@1 score, in less than 10 minutes on a single A100. The compressed models speeds up inference by up to 22.35% with just a single line of change in code over huggingface's implementation with pytorch backend. Low Rank Decomposition (LoRD) models remain compatible with state of the art near-lossless quantization method such as SpQR, which allows leveraging further compression gains of quantization. Lastly, QLoRA over Low Rank Decomposition (LoRD) model further reduces memory requirements by as much as 21.2% over vanilla QLoRA while offering similar gains from parameter efficient fine tuning. Our work shows Low Rank Decomposition (LoRD) as a promising new paradigm for LLM compression.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

doi.org

openreview.net

Automatic Segmentation of the Spinal Cord Nerve Rootlets

Jan Valošek

Theo Mathieu

Raphaëlle Schlienger

Olivia S. Kowalczyk

Julien Cohen-Adad

Precise identification of spinal nerve rootlets is relevant to delineate spinal levels for the study of functional activity in the spinal co… (see more)rd. The goal of this study was to develop an automatic method for the semantic segmentation of spinal nerve rootlets from T2-weighted magnetic resonance imaging (MRI) scans. Images from two open-access MRI datasets were used to train a 3D multi-class convolutional neural network using an active learning approach to segment C2-C8 dorsal nerve rootlets. Each output class corresponds to a spinal level. The method was tested on 3T T2-weighted images from datasets unseen during training to assess inter-site, inter-session, and inter-resolution variability. The test Dice score was 0.67 +- 0.16 (mean +- standard deviation across rootlets levels), suggesting a good performance. The method also demonstrated low inter-vendor and inter-site variability (coefficient of variation= 1.41 %), as well as low inter-session variability (coefficient of variation= 1.30 %) indicating stable predictions across different MRI

2024-07-02

Imaging Neuroscience (published)

doi.org

arxiv.org

A Bayesian Non-Stationary Heteroskedastic Time Series Model for Multivariate Critical Care Data

Zayd Omar

David A. Stephens

Alexandra M. Schmidt

David Buckeridge

2024-07-02

Statistics in Medicine (published)

doi.org

arxiv.org

Temperature-dependent Spike-ACE2 interaction of Omicron subvariants is associated with viral transmission

Mehdi Benlarbi

Shilei Ding

Étienne Bélanger

Alexandra Tauzin

Raphael Poujol

Halima Medjahed

Omar El Ferri

Yuxia Bo

Catherine Bourassa

Julie Hussin

Judith Fafard

Marzena Pazgier

Inès Levade

Cameron Abrams

Marceline Côté

Andrés Finzi

The continued evolution of SARS-CoV-2 requires persistent monitoring of its subvariants. Omicron subvariants are responsible for the vast ma… (see more)jority of SARS-CoV-2 infections worldwide, with XBB and BA.2.86 sublineages representing more than 90% of circulating strains as of January 2024. In this study, we characterized the functional properties of Spike glycoproteins from BA.2.75, CH.1.1, DV.7.1, BA.4/5, BQ.1.1, XBB, XBB.1, XBB.1.16, XBB.1.5, FD.1.1, EG.5.1, HK.3 BA.2.86 and JN.1. We tested their capacity to evade plasma-mediated recognition and neutralization, ACE2 binding, their susceptibility to cold inactivation, Spike processing, as well as the impact of temperature on Spike-ACE2 interaction. We found that compared to the early wild-type (D614G) strain, most Omicron subvariants Spike glycoproteins evolved to escape recognition and neutralization by plasma from individuals who received a fifth dose of bivalent (BA.1 or BA.4/5) mRNA vaccine and improve ACE2 binding, particularly at low temperatures. Moreover, BA.2.86 had the best affinity for ACE2 at all temperatures tested. We found that Omicron subvariants Spike processing is associated with their susceptibility to cold inactivation. Intriguingly, we found that Spike-ACE2 binding at low temperature was significantly associated with growth rates of Omicron subvariants in humans. Overall, we report that Spikes from newly emerged Omicron subvariants are relatively more stable and resistant to plasma-mediated neutralization, present improved affinity for ACE2 which is associated, particularly at low temperatures, with their growth rates.

2024-07-02

mBio (published)

doi.org

Accelerated Benders Decomposition and Local Branching for Dynamic Maximum Covering Location Problems

Steven Lamontagne

Margarida Carvalho

Ribal Atallah

The maximum covering location problem (MCLP) is a key problem in facility location, with many applications and variants. One such variant is… (see more) the dynamic (or multi-period) MCLP, which considers the installation of facilities across multiple time periods. To the best of our knowledge, no exact solution method has been proposed to tackle large-scale instances of this problem. To that end, in this work, we expand upon the current state-of-the-art branch-and-Benders-cut solution method in the static case, by exploring several acceleration techniques. Additionally, we propose a specialised local branching scheme, that uses a novel distance metric in its definition of subproblems and features a new method for efficient and exact solving of the subproblems. These methods are then compared through extensive computational experiments, highlighting the strengths of the proposed methodologies.

2024-07-01

Computers & Operations Research (published)

doi.org

arxiv.org

Imagining a Future of Designing with AI: Dynamic Grounding, Constructive Negotiation, and Sustainable Motivation

Priyan Vaithilingam

Ian Arawjo

Elena L. Glassman

2024-07-01

Designing Interactive Systems Conference (published)

doi.org

arxiv.org

A logistics provider’s profit maximization facility location problem with random utility maximizing followers

David Pinzon Ulloa

Emma Frejinger

Bernard Gendron

2024-07-01

Computers & Operations Research (published)

doi.org

arxiv.org

One-shot Learning for MIPs with SOS1 Constraints

Charly Robinson La Rocca

Jean-François Cordeau

Emma Frejinger

2024-07-01

Operations Research Forum (published)

doi.org

arxiv.org

Many-Shot In-Context Learning

Rishabh Agarwal

Avi Singh

Lei M Zhang

Bernd Bohnet

Luis Rosias

Stephanie C.Y. Chan

Ankesh Anand

Zaheer Abbas

Biao Zhang

Azade Nova

John D. Co-Reyes

Eric Chu

Feryal M. P. Behbahani

Aleksandra Faust

Hugo Larochelle

Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, w… (see more)ithout any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases and can learn high-dimensional functions with numerical inputs. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.

2024-06-18

ICML.cc/2024/Workshop/ICL (poster)

doi.org

openreview.net

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications