Portrait de Cesare Spinoso-Di Piano n'est pas disponible

Cesare Spinoso-Di Piano

Doctorat - McGill
Superviseur⋅e principal⋅e
Sujets de recherche
Traitement du langage naturel

Publications

Identifying and Analyzing Performance-Critical Tokens in Large Language Models
Heyan Huang
Sanxing Chen
Marc-Antoine Rondeau
Yang Gao
Jackie Chi Kit Cheung
In-context learning (ICL) has emerged as an effective solution for few-shot learning with large language models (LLMs). However, how LLMs le… (voir plus)verage demonstrations to specify a task and learn a corresponding computational function through ICL is underexplored. Drawing from the way humans learn from content-label mappings in demonstrations, we categorize the tokens in an ICL prompt into content, stopword, and template tokens. Our goal is to identify the types of tokens whose representations directly influence LLM's performance, a property we refer to as being performance-critical. By ablating representations from the attention of the test example, we find that the representations of informative content tokens have less influence on performance compared to template and stopword tokens, which contrasts with the human attention to informative words. We give evidence that the representations of performance-critical tokens aggregate information from the content tokens. Moreover, we demonstrate experimentally that lexical meaning, repetition, and structural cues are the main distinguishing characteristics of these tokens. Our work sheds light on how LLMs learn to perform tasks from demonstrations and deepens our understanding of the roles different types of tokens play in LLMs.
Qualitative Code Suggestion: A Human-Centric Approach To Qualitative Coding
Samira Abbasgholizadeh Rahimi
Jackie Chi Kit Cheung
Qualitative coding is a content analysis method in which researchers read through a text corpus and assign descriptive labels or qualitative… (voir plus) codes to passages. It is an arduous and manual process which human-computer interaction (HCI) studies have shown could greatly benefit from NLP techniques to assist qualitative coders. Yet, previous attempts at leveraging language technologies have set up qualitative coding as a fully automatable classification problem. In this work, we take a more assistive approach by defining the task of qualitative code suggestion (QCS) in which a ranked list of previously assigned qualitative codes is suggested from an identified passage. In addition to being user-motivated, QCS integrates previously ignored properties of qualitative coding such as the sequence in which passages are annotated, the importance of rare codes and the differences in annotation styles between coders. We investigate the QCS task by releasing the first publicly available qualitative coding dataset, CVDQuoding, consisting of interviews conducted with women at risk of cardiovascular disease. In addition, we conduct a human evaluation which shows that our systems consistently make relevant code suggestions.