Publications

Design smells in multi-language systems and bug-proneness: a survival analysis

Mouna Abidi

Md Saidur Rahman

Moses Openja

Foutse Khomh

2024-07-03

Empirical Software Engineering (published)

doi.org

On Generalization for Generative Flow Networks

Anas Krichel

Nikolay Malkin

Salem Lahlou

Yoshua Bengio

Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an u… (see more)nnormalized probability distribution, called the reward function. This framework learns a policy on a constructed graph, which enables sampling from an approximation of the target probability distribution through successive steps of sampling from the learned policy. To achieve this, GFlowNets can be trained with various objectives, each of which can lead to the model s ultimate goal. The aspirational strength of GFlowNets lies in their potential to discern intricate patterns within the reward function and their capacity to generalize effectively to novel, unseen parts of the reward function. This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function. The experiments will focus on length generalization meaning generalization to states that can be constructed only by longer trajectories than those seen in training.

2024-07-03

ArXiv (preprint)

doi.org

arxiv.org

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

Benno Krojer

Dheeraj Vattikonda

Luis Lara

Varun Jampani

Eva Portelance

Chris Pal

Siva Reddy

An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing… (see more) actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g. physical dynamics, temporality and spatial reasoning. To this end, we meticulously curate the AURORA Dataset (Action-Reasoning-Object-Attribute), a collection of high-quality training data, human-annotated and curated from videos and simulation engines. We focus on a key aspect of quality training data: triplets (source image, prompt, target image) contain a single meaningful visual change described by the prompt, i.e., truly minimal changes between source and target images. To demonstrate the value of our dataset, we evaluate an AURORA-finetuned model on a new expert-curated benchmark (AURORA-Bench) covering 8 diverse editing tasks. Our model significantly outperforms previous editing models as judged by human raters. For automatic evaluations, we find important flaws in previous metrics and caution their use for semantically hard editing tasks. Instead, we propose a new automatic metric that focuses on discriminative understanding. We hope that our efforts : (1) curating a quality training dataset and an evaluation benchmark, (2) developing critical evaluations, and (3) releasing a state-of-the-art model, will fuel further progress on general image editing.

2024-07-03

ArXiv (preprint)

doi.org

arxiv.org

LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

Ayush Kaushal

Tejas Vaidhya

Irina Rish

Low Rank Decomposition of matrix - splitting a large matrix into a product of two smaller matrix offers a means for compression that reduces… (see more) the parameters of a model without sparsification, and hence delivering more speedup on modern hardware. Moreover, unlike quantization, the compressed linear layers remain fully differentiable and all the parameters trainable, while being able to leverage the existing highly efficient kernels over floating point matrices. We study the potential to compress Large Language Models (LLMs) for monolingual Code generation via Low Rank Decomposition (LoRD) and observe that ranks for the linear layers in these models can be reduced by upto 39.58% with less than 1% increase in perplexity. We then use Low Rank Decomposition (LoRD) to compress StarCoder 16B to 13.2B parameter with no drop and to 12.3B with minimal drop in HumanEval Pass@1 score, in less than 10 minutes on a single A100. The compressed models speeds up inference by up to 22.35% with just a single line of change in code over huggingface's implementation with pytorch backend. Low Rank Decomposition (LoRD) models remain compatible with state of the art near-lossless quantization method such as SpQR, which allows leveraging further compression gains of quantization. Lastly, QLoRA over Low Rank Decomposition (LoRD) model further reduces memory requirements by as much as 21.2% over vanilla QLoRA while offering similar gains from parameter efficient fine tuning. Our work shows Low Rank Decomposition (LoRD) as a promising new paradigm for LLM compression.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

doi.org

openreview.net

Model Breadcrumbs: Scalable Upcycling of Finetuned Foundation Models via Sparse Task Vectors Merging

MohammadReza Davari

Eugene Belilovsky

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

openreview.net

Predicting teachers’ research reading: A machine learning approach

Mehrdad Yousefpoori-Naeim

Surina He

Ying Cui

Maria Cutumisu

2024-07-03

International Review of Education (published)

doi.org

Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada

Zachary Yang

Anne Imouza

Maximilian Puelma Touzel

C'ecile Amadoro

Gabrielle Desrosiers-Brisebois

Kellin Pelrine

Sacha Lévy

Jean-François Godbout

Reihaneh Rabbany

Public health measures were among the most polarizing topics debated online during the COVID-19 pandemic. Much of the discussion surrounded … (see more)specific events, such as when and which particular interventions came into practise. In this work, we develop and apply an approach to measure subnational and event-driven variation of partisan polarization and explore how these dynamics varied both across and within countries. We apply our measure to a dataset of over 50 million tweets posted during late 2020, a salient period of polarizing discourse in the early phase of the pandemic. In particular, we examine regional variations in both the United States and Canada, focusing on three specific health interventions: lockdowns, masks, and vaccines. We find that more politically conservative regions had higher levels of partisan polarization in both countries, especially in the US where a strong negative correlation exists between regional vaccination rates and degree of polarization in vaccine related discussions. We then analyze the timing, context, and profile of spikes in polarization, linking them to specific events discussed on social media across different regions in both countries. These typically last only a few days in duration, suggesting that online discussions reflect and could even drive changes in public opinion, which in the context of pandemic response impacts public health outcomes across different regions and over time.

2024-07-03

ArXiv (preprint)

doi.org

arxiv.org

Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada

Zachary Yang

Anne Imouza

Maximilian Puelma Touzel

C'ecile Amadoro

Gabrielle Desrosiers-Brisebois

Kellin Pelrine

Sacha Lévy

Jean-François Godbout

Reihaneh Rabbany

Public health measures were among the most polarizing topics debated online during the COVID-19 pandemic. Much of the discussion surrounded … (see more)specific events, such as when and which particular interventions came into practise. In this work, we develop and apply an approach to measure subnational and event-driven variation of partisan polarization and explore how these dynamics varied both across and within countries. We apply our measure to a dataset of over 50 million tweets posted during late 2020, a salient period of polarizing discourse in the early phase of the pandemic. In particular, we examine regional variations in both the United States and Canada, focusing on three specific health interventions: lockdowns, masks, and vaccines. We find that more politically conservative regions had higher levels of partisan polarization in both countries, especially in the US where a strong negative correlation exists between regional vaccination rates and degree of polarization in vaccine related discussions. We then analyze the timing, context, and profile of spikes in polarization, linking them to specific events discussed on social media across different regions in both countries. These typically last only a few days in duration, suggesting that online discussions reflect and could even drive changes in public opinion, which in the context of pandemic response impacts public health outcomes across different regions and over time.

2024-07-03

ArXiv (preprint)

doi.org

arxiv.org

Reinforcement Learning for Sequence Design Leveraging Protein Language Models

Jithendaraa Subramanian

Shiva Kanth Sujit

Niloy Irtisam

Umong Sain

Derek Nowrouzezahrai

Samira Ebrahimi Kahou

Riashat Islam

2024-07-03

ArXiv (preprint)

doi.org

arxiv.org

The Effect of Data Corruption on Multimodal Long Form Responses

Daniel Z Kaplan

Alexis Roger

Mohamed Osman

Irina Rish

Despite significant progress, Vision-Language Models (VLMs) still struggle with hallucinations, especially in long-form responses. Existing … (see more)strategies have had limited successes in specific cases, and long-form generation remains problematic. In this work we attempt to establish the link between the data used to train the model and the hallucinations in the model's output. To this end, we examine hallucinations through data corruption. We develop a method to corrupt training data and then train models with this data to see the effect on performance. We will show that corrupting only a small portion of the long-form training data significantly impairs the performance of the model on long-form tasks, while leaving simpler tasks like visual question-answering and multiple choice relatively intact. All training code and models are released for reproducibility and future research.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

openreview.net

TriLM vs FloatLM: Ternary LLMs are more Performant than Quantized FP16 LLMs

Ayush Kaushal

Tejas Vaidhya

Tejas Pandey

Aaryan Bhagat

Irina Rish

Ternary LLMs offer significantly better performance for their size (measured in bits) than the models trained and deployed in FP16/BF16. Giv… (see more)en the widespread usage of quantization before deployment and advancements in Post Training Quantization of LLMs, a pivotal question arises: do ternary LLMs indeed provide any discernible benefits? To address this, we first build an open family of pre-trained ternary Large Language Models (TriLM). Additionally, we include their counterparts pre-trained in FP16 (FloatLM) and quantized versions of FloatLM (QuantLM) with parameters across almost two orders of magnitude - from 99M to 3.9B parameters. We demonstrate that TriLMs with 3B+ parameters start to offer competitive performance compared to FloatLMs with the same parameter count, while providing significantly better performance for their size. Specifically, TriLM 3.9B, with less bits than FloatLM 830M, ranks between FloatLM 2.4B and FloatLM 3.9B when averaged across 6 popular commonsense and reasoning benchmarks. TriLMs also outperform quantized models, with TriLM 3.9B surpassing the larger QuantLM-3bit 3.9B. Furthermore, across knowledge-based benchmarks, TriLM maintains a superiority for its size, but lags for its parameter count. TriLM 3.9B falls halfway between FloatLM 1.5B and 2.4B, close to QuantLM-4bit 2.4B. To advance research on Ternary LMs, we open source over 500+ checkpoints across the model families.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

openreview.net

VFA: Vision Frequency Analysis of Foundation Models and Human

Mohammad Javad Darvishi Bayazi

Md Rifat Arefin

Jocelyn Faubert

Irina Rish

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

doi.org

openreview.net

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications