Portrait of Ian Arawjo

Ian Arawjo

Associate Academic Member
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research

Biography

Ian Arawjo is an assistant professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He holds a PhD in information science from Cornell University, where he was advised by Tapan Parikh. His dissertation work spanned the intersection of computer programming and culture, investigating programming as a social and cultural practice. Arawjo has experience applying a range of human-computer interaction (HCI) methods, from ethnographic fieldwork, to archival research, to developing novel systems (used by thousands of people) and running usability studies.

Currently, he works on projects at the intersection of programming, AI and HCI, including how new AI capabilities can help us reimagine the practice of programming. He also works on large language model (LLM) evaluation, through high-visibility open-source projects such as ChainForge. His first-authored papers have won awards at top HCI conferences, including the Conference on Human Factors in Computing Systems (CHI), the Computer-Supported Cooperative Work and Social Computing Conference (CSCW) and the User Interface Software and Technology Symposium (UIST).

Current Students

PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :

Publications

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations
What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-in… (see more)tegrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors'views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.
How Notations Evolve: A Historical Analysis with Implications for Supporting User-Defined Abstractions
J.D. Zamfirescu-Pereira
Elena L. Glassman
Damien Masson
Democratizing Game Modding with GenAI: A Case Study of StarCharM, a Stardew Valley Character Maker
Hamid Zand Miralvand
Mohammad Ronagh Nikghalb
Mohammad Darandeh
Abidullah Khan
Jinghui Cheng
Game modding offers unique and personalized gaming experiences, but the technical complexity of creating mods often limits participation to … (see more)skilled users. We envision a future where every player can create personalized mods for their games. To explore this space, we designed StarCharM, a GenAI-based non-player character (NPC) creator for Stardew Valley. Our tool enables players to iteratively create new NPC mods, requiring minimal user input while allowing for fine-grained adjustments through user control. We conducted a user study with ten Stardew Valley players who had varied mod usage experiences to understand the impacts of StarCharM and provide insights into how GenAI tools may reshape modding, particularly in NPC creation. Participants expressed excitement in bringing their character ideas to life, although they noted challenges in generating rich content to fulfill complex visions. While they believed GenAI tools like StarCharM can foster a more diverse modding community, some voiced concerns about diminished originality and community engagement that may come with such technology. Our findings provided implications and guidelines for the future of GenAI-powered modding tools and co-creative modding practices.
Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Priyan Vaithilingam
Daniel Lee
Elena L. Glassman
Dynamic Abstractions: Building the Next Generation of Cognitive Tools and Interfaces
Sangho Suh
Hai Dang
Ryan Yen
Josh M. Pollock
Rubaiat Habib Kazi
Hariharan Subramonyam
Jingyi Li
Nazmus Saquib
Arvind Satyanarayan
ChainBuddy: An AI Agent System for Generating LLM Pipelines
Imagining a Future of Designing with AI: Dynamic Grounding, Constructive Negotiation, and Sustainable Motivation
Priyan Vaithilingam
Elena L. Glassman
An AI-Resilient Text Rendering Technique for Reading and Skimming Documents
Ziwei Gu
Kenneth Li
Jonathan K. Kummerfeld
Elena L. Glassman
ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Chelse Swoopes
Priyan Vaithilingam
Martin Wattenberg
Elena L. Glassman
Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that… (see more) go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.
Schrödinger's Update: User Perceptions of Uncertainties in Proprietary Large Language Model Updates
Zilin Ma
Yiyang Mei
Krzysztof Z. Gajos
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G Parameswaran
Antagonistic AI
Alice Cai
Elena L. Glassman