Portrait de Ian Arawjo

Ian Arawjo

Membre académique associé
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle

Biographie

Ian Arawjo est professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il détient un doctorat en sciences de l'information de l'Université Cornell, réalisé sous la supervision du professeur Tapan Parikh.

Sa thèse portait sur l'intersection de la programmation informatique et de la culture, explorant la programmation en tant que pratique sociale et culturelle. Il a acquis de l'expérience dans l'application d'une vaste gamme de méthodes liées aux interfaces homme-machine (IHM), allant du travail de terrain ethnographique à la recherche archivistique, en passant par le développement de systèmes novateurs (utilisés par des milliers de personnes) et la réalisation d'études de convivialité. Actuellement, il travaille sur des projets au carrefour de la programmation, de l'IA et de l'IHM, notamment sur la manière dont les nouvelles capacités de l'IA peuvent nous aider à réimaginer la pratique de la programmation.

Il travaille également sur l'évaluation de grands modèles de langage (LLM), à travers des projets en code source libre à forte visibilité tels que ChainForge. Les articles auxquels il a contribué comme premier auteur ont remporté des prix lors de grandes conférences portant sur l’IHM, notamment la Conference on Human Factors in Computing Systems (CHI), la Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) et le Symposium on User Interface Software and Technology (UIST).

Étudiants actuels

Maîtrise recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :

Publications

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations
What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-in… (voir plus)tegrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors'views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.
How Notations Evolve: A Historical Analysis with Implications for Supporting User-Defined Abstractions
J.D. Zamfirescu-Pereira
Elena L. Glassman
Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Priyan Vaithilingam
Daniel Lee
Elena L. Glassman
Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Priyan Vaithilingam
Daniel Lee
Elena L. Glassman
Dynamic Abstractions: Building the Next Generation of Cognitive Tools and Interfaces
Sangho Suh
Hai Dang
Ryan Yen
Josh M. Pollock
Rubaiat Habib Kazi
Hariharan Subramonyam
Jingyi Li
Nazmus Saquib
Arvind Satyanarayan
ChainBuddy: An AI Agent System for Generating LLM Pipelines
As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LL… (voir plus)M behavior on user-specific tasks and craft effective pipelines to do so. Many users struggle with where to start, often referred to as the"blank page"problem. ChainBuddy, an AI assistant for generating evaluative LLM pipelines built into the ChainForge platform, aims to tackle this issue. ChainBuddy offers a straightforward and user-friendly way to plan and evaluate LLM behavior, making the process less daunting and more accessible across a wide range of possible tasks and use cases. We report a within-subjects user study comparing ChainBuddy to the baseline interface. We find that when using AI assistance, participants reported a less demanding workload and felt more confident setting up evaluation pipelines of LLM behavior. We derive insights for the future of interfaces that assist users in the open-ended evaluation of AI.
ChainBuddy: An AI-assisted Agent System for Generating LLM Pipelines
ChainBuddy: An AI-assisted Agent System for Generating LLM Pipelines
Imagining a Future of Designing with AI: Dynamic Grounding, Constructive Negotiation, and Sustainable Motivation
Priyan Vaithilingam
Elena L. Glassman
An AI-Resilient Text Rendering Technique for Reading and Skimming Documents
Ziwei Gu
Kenneth Li
Jonathan K. Kummerfeld
Elena L. Glassman
ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Chelse Swoopes
Priyan Vaithilingam
Martin Wattenberg
Elena L. Glassman
Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that… (voir plus) go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.
Schrödinger's Update: User Perceptions of Uncertainties in Proprietary Large Language Model Updates
Zilin Ma
Yiyang Mei
Krzysztof Z. Gajos