Publications

Evaluating In-Context Learning of Libraries for Code Generation

Arkil Patel

Pradeep Dasigi

Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising ar… (voir plus)ea is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work has shown that large proprietary LLMs can learn novel library usage in-context from demonstrations. These results raise several open questions: whether demonstrations of library usage is required, whether smaller (and more open) models also possess such capabilities, etc. In this work, we take a broader approach by systematically evaluating a diverse array of LLMs across three scenarios reflecting varying levels of domain specialization to understand their abilities and limitations in generating code based on libraries defined in-context. Our results show that even smaller open-source LLMs like Llama-2 and StarCoder demonstrate an adept understanding of novel code libraries based on specification presented in-context. Our findings further reveal that LLMs exhibit a surprisingly high proficiency in learning novel library modules even when provided with just natural language descriptions or raw code implementations of the functions, which are often cheaper to obtain than demonstrations. Overall, our results pave the way for harnessing LLMs in more adaptable and dynamic coding environments.

2024-01-01

North American Chapter of the Association for Computational Linguistics (publié)

doi.org

Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting

David Latortue

Moetez Kdayem

Fidel A. Guerrero Peña

Eric Granger

Marco Pedersoli

Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly boun… (voir plus)ding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision affect the performance of deep person counting architectures for image classification and point-level localization. Our experiments indicate that counting people using a convolutional neural network with image-level annotation achieves a level of accuracy that is competitive with YOLO detectors and point-level localization models yet provides a higher frame rate and a simi-lar amount of model parameters. Our code is available at: https://github.com/tortueTortue/IRPeopleCounting.

2024-01-01

2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) (publié)

doi.org

arxiv.org

Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)

Jiayi Wang

David Ifeoluwa Adelani

Pontus Stenetorp

2024-01-01

Conference on Machine Translation (publié)

doi.org

Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)

Jiayi Wang

David Ifeoluwa Adelani

Pontus Stenetorp

2024-01-01

Conference on Machine Translation (publié)

doi.org

Evaluation algorithmique inclusive de la qualité des espaces publics

Shin (Alexandre) Koseki

Toumadher Ammar

Rashid Ahmad Mushkani

Hugo Berard

Sarah Tannir

2024-01-01

SHS Web of Conferences (publié)

doi.org

An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter

Sahar Omidi Shayegan

Isar Nejadgholi

Kellin Pelrine

Hao Yu

Sacha Lévy

Zachary Yang

Jean-François Godbout

Reihaneh Rabbany

Large Language Models (LLMs) have shown significant promise in various tasks, including identifying the political beliefs of English-speakin… (voir plus)g social media users from their posts. However, assessing LLMs for this task in non-English languages remains unexplored. In this work, we ask to what extent LLMs can predict the political ideologies of users in Persian social media. To answer this question, we first acknowledge that political parties are not well-defined among Persian users, and therefore, we simplify the task to a much simpler task of hyperpartisan ideology detection. We create a new benchmark and show the potential and limitations of both open-source and commercial LLMs in classifying the hyper-partisan ideologies of users. We compare these models with smaller fine-tuned models, both on the Persian language (ParsBERT) and translated data (RoBERTa), showing that they considerably outperform generative LLMs in this task. We further demonstrate that the performance of the generative LLMs degrades when classifying users based on their tweets instead of their bios and even when tweets are added as additional information, whereas the smaller fine-tuned models are robust and achieve similar performance for all classes. This study is a first step toward political ideology detection in Persian Twitter, with implications for future research to understand the dynamics of ideologies in Persian social media.

2024-01-01

EURALI (publié)

www.semanticscholar.org

An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter

Sahar Omidi Shayegan

Isar Nejadgholi

Kellin Pelrine

Hao Yu

Sacha Lévy

Zachary Yang

Jean-François Godbout

Reihaneh Rabbany

Large Language Models (LLMs) have shown significant promise in various tasks, including identifying the political beliefs of English-speakin… (voir plus)g social media users from their posts. However, assessing LLMs for this task in non-English languages remains unexplored. In this work, we ask to what extent LLMs can predict the political ideologies of users in Persian social media. To answer this question, we first acknowledge that political parties are not well-defined among Persian users, and therefore, we simplify the task to a much simpler task of hyperpartisan ideology detection. We create a new benchmark and show the potential and limitations of both open-source and commercial LLMs in classifying the hyper-partisan ideologies of users. We compare these models with smaller fine-tuned models, both on the Persian language (ParsBERT) and translated data (RoBERTa), showing that they considerably outperform generative LLMs in this task. We further demonstrate that the performance of the generative LLMs degrades when classifying users based on their tweets instead of their bios and even when tweets are added as additional information, whereas the smaller fine-tuned models are robust and achieve similar performance for all classes. This study is a first step toward political ideology detection in Persian Twitter, with implications for future research to understand the dynamics of ideologies in Persian social media.

An Exact Method for (Constrained) Assortment Optimization Problems with Product Costs

Markus Leitner

Andrea Lodi

Roberto Roberti

Claudio Sole

2024-01-01

INFORMS J. Comput. (publié)

doi.org

arxiv.org

Exploratory Study on the Impact of English Bias of Generative Large Language Models in Dutch and French

Ayla Rigouts Terryn

Miryam de Lhoneux

Exploratory Study on the Impact of English Bias of Generative Large Language Models in Dutch and French

Ayla Rigouts Terryn

Miryam de Lhoneux

The most widely used LLMs like GPT4 and Llama 2 are trained on large amounts of data, mostly in English but are still able to deal with non-… (voir plus)English languages. This English bias leads to lower performance in other languages, especially low-resource ones. This paper studies the linguistic quality of LLMs in two non-English high-resource languages: Dutch and French, with a focus on the influence of English. We first construct a comparable corpus of text generated by humans versus LLMs (GPT-4, Zephyr, and GEITje) in the news domain. We proceed to annotate linguistic issues in the LLM-generated texts, obtaining high inter-annotator agreement, and analyse these annotated issues. We find a substantial influence of English for all models under all conditions: on average, 16% of all annotations of linguistic errors or peculiarities had a clear link to English. Fine-tuning a LLM to a target language (GEITje is fine-tuned on Dutch) reduces the number of linguistic issues and probably also the influence of English. We further find that using a more elaborate prompt leads to linguistically better results than a concise prompt. Finally, increasing the temperature for one of the models leads to lower linguistic quality but does not alter the influence of English.

2024-01-01

HUMEVAL (publié)

www.semanticscholar.org

Exploring Quantization for Efficient Pre-Training of Transformer Language Models

Kamran Chitsaz

Quentin Fournier

Goncalo Mordido

Sarath Chandar

The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has p… (voir plus)roven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. Code is available at https://github.com/chandar-lab/EfficientLLMs.

2024-01-01

EMNLP (Findings) (publié)

doi.org

arxiv.org

Exploring Scaling Trends in LLM Robustness

Nikolaus H. R. Howe

Michał Zając

Ian R. McKenzie

Oskar John Hollinsworth

Tom Tseng

Pierre-Luc Bacon

Adam Gleave

Language model capabilities predictably improve from scaling the model’s size and training data. Motivated by this, increasingly large lan… (voir plus)guage models have been trained, yielding an array of impressive capabilities. Yet these models suffer from adversarial prompts such as “jailbreaks” that hijack models to perform undesired behavior, posing a significant risk of misuse. Prior work has found that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically, finding that larger models respond substantially more effectively to adversarial training, but there is little to no benefit from model scale in the absence of defenses.

2024-01-01

arXiv.org (prépublication)

doi.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications