Publications

TriLM vs FloatLM: Ternary LLMs are more Performant than Quantized FP16 LLMs

Ayush Kaushal

Tejas Vaidhya

Tejas Pandey

Aaryan Bhagat

Ternary LLMs offer significantly better performance for their size (measured in bits) than the models trained and deployed in FP16/BF16. Giv… (voir plus)en the widespread usage of quantization before deployment and advancements in Post Training Quantization of LLMs, a pivotal question arises: do ternary LLMs indeed provide any discernible benefits? To address this, we first build an open family of pre-trained ternary Large Language Models (TriLM). Additionally, we include their counterparts pre-trained in FP16 (FloatLM) and quantized versions of FloatLM (QuantLM) with parameters across almost two orders of magnitude - from 99M to 3.9B parameters. We demonstrate that TriLMs with 3B+ parameters start to offer competitive performance compared to FloatLMs with the same parameter count, while providing significantly better performance for their size. Specifically, TriLM 3.9B, with less bits than FloatLM 830M, ranks between FloatLM 2.4B and FloatLM 3.9B when averaged across 6 popular commonsense and reasoning benchmarks. TriLMs also outperform quantized models, with TriLM 3.9B surpassing the larger QuantLM-3bit 3.9B. Furthermore, across knowledge-based benchmarks, TriLM maintains a superiority for its size, but lags for its parameter count. TriLM 3.9B falls halfway between FloatLM 1.5B and 2.4B, close to QuantLM-4bit 2.4B. To advance research on Ternary LMs, we open source over 500+ checkpoints across the model families.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

openreview.net

VFA: Vision Frequency Analysis of Foundation Models and Human

Mohammad Javad Darvishi Bayazi

Md Rifat Arefin

Jocelyn Faubert

Irina Rish

Machine learning models often struggle with distribution shifts in real-world scenarios, whereas humans exhibit robust adaptation. Models th… (voir plus)at better align with human perception may achieve higher out-of-distribution generalization. In this study, we investigate how various characteristics of large-scale computer vision models influence their alignment with human capabilities and robustness. Our findings indicate that increasing model and data size, along with incorporating rich semantic information and multiple modalities, significantly enhances models' alignment with human perception and their overall robustness. Our empirical analysis demonstrates a strong correlation between out-of-distribution accuracy and human alignment.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

openreview.net

Automatic Segmentation of the Spinal Cord Nerve Rootlets

Jan Valošek

Theo Mathieu

Raphaëlle Schlienger

Olivia S. Kowalczyk

Julien Cohen-Adad

Precise identification of spinal nerve rootlets is relevant to delineate spinal levels for the study of functional activity in the spinal co… (voir plus)rd. The goal of this study was to develop an automatic method for the semantic segmentation of spinal nerve rootlets from T2-weighted magnetic resonance imaging (MRI) scans. Images from two open-access MRI datasets were used to train a 3D multi-class convolutional neural network using an active learning approach to segment C2-C8 dorsal nerve rootlets. Each output class corresponds to a spinal level. The method was tested on 3T T2-weighted images from datasets unseen during training to assess inter-site, inter-session, and inter-resolution variability. The test Dice score was 0.67 +- 0.16 (mean +- standard deviation across rootlets levels), suggesting a good performance. The method also demonstrated low inter-vendor and inter-site variability (coefficient of variation= 1.41 %), as well as low inter-session variability (coefficient of variation= 1.30 %) indicating stable predictions across different MRI

2024-07-02

Imaging Neuroscience (publié)

doi.org

arxiv.org

A Bayesian Non-Stationary Heteroskedastic Time Series Model for Multivariate Critical Care Data

Zayd Omar

David A. Stephens

Alexandra M. Schmidt

David Buckeridge

2024-07-02

Statistics in Medicine (publié)

doi.org

arxiv.org

Temperature-dependent Spike-ACE2 interaction of Omicron subvariants is associated with viral transmission

Mehdi Benlarbi

Shilei Ding

Étienne Bélanger

Alexandra Tauzin

Raphael Poujol

Halima Medjahed

Omar El Ferri

Yuxia Bo

Catherine Bourassa

Julie Hussin

Judith Fafard

Marzena Pazgier

Inès Levade

Cameron Abrams

Marceline Côté

Andrés Finzi

The continued evolution of SARS-CoV-2 requires persistent monitoring of its subvariants. Omicron subvariants are responsible for the vast ma… (voir plus)jority of SARS-CoV-2 infections worldwide, with XBB and BA.2.86 sublineages representing more than 90% of circulating strains as of January 2024. In this study, we characterized the functional properties of Spike glycoproteins from BA.2.75, CH.1.1, DV.7.1, BA.4/5, BQ.1.1, XBB, XBB.1, XBB.1.16, XBB.1.5, FD.1.1, EG.5.1, HK.3 BA.2.86 and JN.1. We tested their capacity to evade plasma-mediated recognition and neutralization, ACE2 binding, their susceptibility to cold inactivation, Spike processing, as well as the impact of temperature on Spike-ACE2 interaction. We found that compared to the early wild-type (D614G) strain, most Omicron subvariants Spike glycoproteins evolved to escape recognition and neutralization by plasma from individuals who received a fifth dose of bivalent (BA.1 or BA.4/5) mRNA vaccine and improve ACE2 binding, particularly at low temperatures. Moreover, BA.2.86 had the best affinity for ACE2 at all temperatures tested. We found that Omicron subvariants Spike processing is associated with their susceptibility to cold inactivation. Intriguingly, we found that Spike-ACE2 binding at low temperature was significantly associated with growth rates of Omicron subvariants in humans. Overall, we report that Spikes from newly emerged Omicron subvariants are relatively more stable and resistant to plasma-mediated neutralization, present improved affinity for ACE2 which is associated, particularly at low temperatures, with their growth rates.

2024-07-02

mBio (publié)

doi.org

Towards More Realistic Extraction Attacks: An Adversarial Perspective

Yash More

Prakhar Ganesh

Golnoosh Farnadi

2024-07-02

ArXiv (prépublication)

doi.org

arxiv.org

169Yb-based high dose rate intensity modulated brachytherapy for focal treatment of prostate cancer

Maude Robitaille

Cynthia Ménard

Gabriel Famulari

Dominic Béliveau-Nadeau

Shirin A. Enger

2024-07-01

Brachytherapy (publié)

doi.org

Accelerated Benders Decomposition and Local Branching for Dynamic Maximum Covering Location Problems

Steven Lamontagne

Margarida Carvalho

Ribal Atallah

The maximum covering location problem (MCLP) is a key problem in facility location, with many applications and variants. One such variant is… (voir plus) the dynamic (or multi-period) MCLP, which considers the installation of facilities across multiple time periods. To the best of our knowledge, no exact solution method has been proposed to tackle large-scale instances of this problem. To that end, in this work, we expand upon the current state-of-the-art branch-and-Benders-cut solution method in the static case, by exploring several acceleration techniques. Additionally, we propose a specialised local branching scheme, that uses a novel distance metric in its definition of subproblems and features a new method for efficient and exact solving of the subproblems. These methods are then compared through extensive computational experiments, highlighting the strengths of the proposed methodologies.

2024-07-01

Computers & Operations Research (publié)

doi.org

arxiv.org

Expressivity of Neural Networks with Random Weights and Learned Biases

Ezekiel Williams

Avery Hee-Woon Ryoo

Thomas Jiralerspong

Alexandre Payeur

Matt Perich

Luca Mazzucato

Guillaume Lajoie

2024-07-01

ArXiv (prépublication)

doi.org

arxiv.org

Imagining a Future of Designing with AI: Dynamic Grounding, Constructive Negotiation, and Sustainable Motivation

Priyan Vaithilingam

Ian Arawjo

Elena L. Glassman

2024-07-01

Designing Interactive Systems Conference (publié)

doi.org

arxiv.org

Do LLMs Meet the Needs of Software Tutorial Writers? Opportunities and Design Implications

Avinash Bhat

Disha Shrivastava

Jin Guo

Creating software tutorials involves developing accurate code examples and explanatory text that engages and informs the reader. Large Langu… (voir plus)age Models (LLMs) demonstrate a strong capacity to generate both text and code, but their potential to assist tutorial writing is unknown. By interviewing and observing seven experienced writers using OpenAI playground as an exploration environment, we uncover design opportunities for leveraging LLMs in software tutorial writing. Our findings reveal background research, resource creation, and maintaining quality standards as critical areas where LLMs could significantly assist writers. We observe how tutorial writers generated tutorial content while exploring LLMs’ capabilities, formulating prompts, verifying LLM outputs, and reflecting on interaction goals and strategies. Our observation highlights that the unpredictability of LLM outputs and unintuitive interface design contributed to skepticism about LLM’s utility. Informed by these results, we contribute recommendations for designing LLM-based tutorial writing tools to mitigate usability challenges and harness LLMs’ full potential.

2024-07-01

Conference on Designing Interactive Systems (publié)

doi.org

A logistics provider’s profit maximization facility location problem with random utility maximizing followers

David Pinzon Ulloa

Emma Frejinger

Bernard Gendron

2024-07-01

Computers & Operations Research (publié)

doi.org

arxiv.org

Le traitement du langage naturel à l'ère de l'IA générative

Boussole des politiques en IA

Vie étudiante et ressources

Publications

Le traitement du langage naturel à l'ère de l'IA générative

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications