Cesare Spinoso-Di Piano

Identifying and Analyzing Performance-Critical Tokens in Large Language Models

Heyan Huang

Sanxing Chen

Marc-Antoine Rondeau

Yang Gao

Jackie Chi Kit Cheung

In-context learning (ICL) has emerged as an effective solution for few-shot learning with large language models (LLMs). However, how LLMs le… (voir plus)verage demonstrations to specify a task and learn a corresponding computational function through ICL is underexplored. Drawing from the way humans learn from content-label mappings in demonstrations, we categorize the tokens in an ICL prompt into content, stopword, and template tokens. Our goal is to identify the types of tokens whose representations directly influence LLM's performance, a property we refer to as being performance-critical. By ablating representations from the attention of the test example, we find that the representations of informative content tokens have less influence on performance compared to template and stopword tokens, which contrasts with the human attention to informative words. We give evidence that the representations of performance-critical tokens aggregate information from the content tokens. Moreover, we demonstrate experimentally that lexical meaning, repetition, and structural cues are the main distinguishing characteristics of these tokens. Our work sheds light on how LLMs learn to perform tasks from demonstrations and deepens our understanding of the roles different types of tokens play in LLMs.

2026-03-13

AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

Qualitative Code Suggestion: A Human-Centric Approach To Qualitative Coding

Cesare Spinoso-Di Piano

Samira Abbasgholizadeh Rahimi

Jackie Chi Kit Cheung

Qualitative coding is a content analysis method in which researchers read through a text corpus and assign descriptive labels or qualitative… (voir plus) codes to passages. It is an arduous and manual process which human-computer interaction (HCI) studies have shown could greatly benefit from NLP techniques to assist qualitative coders. Yet, previous attempts at leveraging language technologies have set up qualitative coding as a fully automatable classification problem. In this work, we take a more assistive approach by defining the task of qualitative code suggestion (QCS) in which a ranked list of previously assigned qualitative codes is suggested from an identified passage. In addition to being user-motivated, QCS integrates previously ignored properties of qualitative coding such as the sequence in which passages are annotated, the importance of rare codes and the differences in annotation styles between coders. We investigate the QCS task by releasing the first publicly available qualitative coding dataset, CVDQuoding, consisting of interviews conducted with women at risk of cardiovascular disease. In addition, we conduct a human evaluation which shows that our systems consistently make relevant code suggestions.

2023-11-30

Findings of the Association for Computational Linguistics: EMNLP 2023 (publié)

doi.org

openreview.net

McGill BabyLM Shared Task Submission: The Effects of Data Formatting and Structural Biases

Ziling Cheng

Rahul Aralikatte

Ian Porada

Cesare Spinoso-Di Piano

Jackie CK Cheung

In this study, we describe our submission to the 2023 BabyLM shared-task's strict-small track.Our findings demonstrate the feasibility of tr… (voir plus)aining high-performing models within the constraints of limited data, computational resources, and time.We provide evidence that the formatting of input can significantly impact downstream performance.Furthermore, the induction of structural biases into the models through the use of part-of-speech trees yields modest benefits.Our most successful model achieves 79% on the BLiMP evaluations and 72% on the SuperGLUE evaluations.

2022-12-31

BabyLM Challenge @ Conference on Computational Natural Language Learning (publié)

doi.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Cesare Spinoso-Di Piano

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Cesare Spinoso-Di Piano

Publications