Portrait de Ian Arawjo

Ian Arawjo

Membre académique associé
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle

Biographie

Ian Arawjo est professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il détient un doctorat en sciences de l'information de l'Université Cornell, réalisé sous la supervision du professeur Tapan Parikh.

Sa thèse portait sur l'intersection de la programmation informatique et de la culture, explorant la programmation en tant que pratique sociale et culturelle. Il a acquis de l'expérience dans l'application d'une vaste gamme de méthodes liées aux interfaces homme-machine (IHM), allant du travail de terrain ethnographique à la recherche archivistique, en passant par le développement de systèmes novateurs (utilisés par des milliers de personnes) et la réalisation d'études de convivialité. Actuellement, il travaille sur des projets au carrefour de la programmation, de l'IA et de l'IHM, notamment sur la manière dont les nouvelles capacités de l'IA peuvent nous aider à réimaginer la pratique de la programmation.

Il travaille également sur l'évaluation de grands modèles de langage (LLM), à travers des projets en code source libre à forte visibilité tels que ChainForge. Les articles auxquels il a contribué comme premier auteur ont remporté des prix lors de grandes conférences portant sur l’IHM, notamment la Conference on Human Factors in Computing Systems (CHI), la Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) et le Symposium on User Interface Software and Technology (UIST).

Étudiants actuels

Maîtrise recherche - UdeM
Maîtrise professionnelle - UdeM

Publications

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G Parameswaran
Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly bei… (voir plus)ng used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators simply inherit all the problems of the LLMs they evaluate, requiring further human validation. We present a mixed-initiative approach to ``validate the validators'' -- aligning LLM-generated evaluation functions (be it prompts or code) with human requirements. Our interface, EvalGen, provides automated assistance to users in generating evaluation criteria and implementing assertions. While generating candidate implementations (Python functions, LLM grader prompts), EvalGen asks humans to grade a subset of LLM outputs; this feedback is used to select implementations that better align with user grades. A qualitative study finds overall support for EvalGen but underscores the subjectivity and iterative process of alignment. In particular, we identify a phenomenon we dub \emph{criteria drift}: users need criteria to grade outputs, but grading outputs helps users define criteria. What is more, some criteria appears \emph{dependent} on the specific LLM outputs observed (rather than independent criteria that can be defined \emph{a priori}), raising serious questions for approaches that assume the independence of evaluation from observation of model outputs. We present our interface and implementation details, a comparison of our algorithm with a baseline approach, and implications for the design of future LLM evaluation assistants.
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G Parameswaran
Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly bei… (voir plus)ng used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators simply inherit all the problems of the LLMs they evaluate, requiring further human validation. We present a mixed-initiative approach to ``validate the validators'' -- aligning LLM-generated evaluation functions (be it prompts or code) with human requirements. Our interface, EvalGen, provides automated assistance to users in generating evaluation criteria and implementing assertions. While generating candidate implementations (Python functions, LLM grader prompts), EvalGen asks humans to grade a subset of LLM outputs; this feedback is used to select implementations that better align with user grades. A qualitative study finds overall support for EvalGen but underscores the subjectivity and iterative process of alignment. In particular, we identify a phenomenon we dub \emph{criteria drift}: users need criteria to grade outputs, but grading outputs helps users define criteria. What is more, some criteria appears \emph{dependent} on the specific LLM outputs observed (rather than independent criteria that can be defined \emph{a priori}), raising serious questions for approaches that assume the independence of evaluation from observation of model outputs. We present our interface and implementation details, a comparison of our algorithm with a baseline approach, and implications for the design of future LLM evaluation assistants.
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G Parameswaran
Antagonistic AI
Alice Cai
Elena L. Glassman
Antagonistic AI
Alice Cai
Elena L. Glassman
Antagonistic AI
Alice Cai
Elena L. Glassman
The vast majority of discourse around AI development assumes that subservient,"moral"models aligned with"human values"are universally benefi… (voir plus)cial -- in short, that good AI is sycophantic AI. We explore the shadow of the sycophantic paradigm, a design space we term antagonistic AI: AI systems that are disagreeable, rude, interrupting, confrontational, challenging, etc. -- embedding opposite behaviors or values. Far from being"bad"or"immoral,"we consider whether antagonistic AI systems may sometimes have benefits to users, such as forcing users to confront their assumptions, build resilience, or develop healthier relational boundaries. Drawing from formative explorations and a speculative design workshop where participants designed fictional AI technologies that employ antagonism, we lay out a design space for antagonistic AI, articulating potential benefits, design techniques, and methods of embedding antagonistic elements into user experience. Finally, we discuss the many ethical challenges of this space and identify three dimensions for the responsible design of antagonistic AI -- consent, context, and framing.
ChainBuddy: An AI Agent System for Generating LLM Pipelines
Jingyue Zhang
ChainBuddy: An AI-assisted Agent System for Helping Users Set up LLM Pipelines
Jingyue Zhang
ChainForge: An open-source visual programming environment for prompt engineering
Priyan Vaithilingam
Martin Wattenberg
Elena Glassman
Notational Programming for Notebook Environments: A Case Study with Quantum Circuits
Anthony DeArmas
Michael Roberts
Shrutarshi Basu
Tapan Parikh
We articulate a vision for computer programming that includes pen-based computing, a paradigm we term notational programming. Notational pro… (voir plus)gramming blurs contexts: certain typewritten variables can be referenced in handwritten notation and vice-versa. To illustrate this paradigm, we developed an extension, Notate, to computational notebooks which allows users to open drawing canvases within lines of code. As a case study, we explore quantum programming and designed a notation, Qaw, that extends quantum circuit notation with abstraction features, such as variable-sized wire bundles and recursion. Results from a usability study with novices suggest that users find our core interaction of implicit cross-context references intuitive, but suggests further improvements to debugging infrastructure, interface design, and recognition rates. Throughout, we discuss questions raised by the notational paradigm, including a shift from ‘recognition’ of notations to ‘reconfiguration’ of practices and values around programming, and from ‘sketching’ to writing and drawing, or what we call ‘notating.’
To Write Code: The Cultural Fabrication of Programming Notation and Practice
Writing and its means have become detached. Unlike written and drawn practices developed prior to the 20th century, notation for programming… (voir plus) computers developed in concert and conflict with discretizing infrastructure such as the shift-key typewriter and data processing pipelines. In this paper, I recall the emergence of high-level notation for representing computation. I show how the earliest inventors of programming notations borrowed from various written cultural practices, some of which came into conflict with the constraints of digitizing machines, most prominently the typewriter. As such, I trace how practices of "writing code" were fabricated along social, cultural, and material lines at the time of their emergence. By juxtaposing early visions with the modern status quo, I question long-standing terminology, dichotomies, and epistemological tendencies in the field of computer programming. Finally, I argue that translation work is a fundamental property of the practice of writing code by advancing an intercultural lens on programming practice rooted in history.
To Write Code: The Cultural Fabrication of Programming Notation and Practice