Portrait of Jin Guo

Jin Guo

Associate Academic Member
Assistant Professor, McGill University, School of Computer Science
Research Topics
Human-AI interaction
Human-Centered AI
Human-Computer Interaction (HCI)
Privacy
Responsible AI

Biography

Jin L.C. Guo is an assistant professor at the School of Computer Science, McGill University.

She is interested in using AI techniques to solve software engineering problems. Her recent research focuses on mining domain knowledge from software traceability data and using such knowledge to facilitate automated SE tasks, such as trace retrieval and project Q&A.

Guo completed her PhD at the University of Notre Dame. Prior to that, she worked on image processing and computer vision in Fuji Xerox’s research lab.

Current Students

PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Postdoctorate - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
Master's Research - McGill University

Publications

Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code
Chenyang Yang
Shurui Zhou
Christian Kästner
Data scientists reportedly spend a significant amount of their time in their daily routines on data wrangling, i.e. cleaning data and extrac… (see more)ting features. However, data wrangling code is often repetitive and error-prone to write. Moreover, it is easy to introduce subtle bugs when reusing and adopting existing code, which results in reduced model quality. To support data scientists with data wrangling, we present a technique to generate documentation for data wrangling code. We use (1) program synthesis techniques to automatically summarize data transformations and (2) test case selection techniques to purposefully select representative examples from the data based on execution information collected with tailored dynamic program analysis. We demonstrate that a JupyterLab extension with our technique can provide on-demand documentation for many cells in popular notebooks and find in a user study that users with our plugin are faster and more effective at finding realistic bugs in data wrangling code.
Generating GitHub Repository Descriptions: A Comparison of Manual and Automated Approaches
Jazlyn Hellman
Eunbee Jang
Christoph Treude
Chenzhun Huang
Given the vast number of repositories hosted on GitHub, project discovery and retrieval have become increasingly important for GitHub users.… (see more) Repository descriptions serve as one of the first points of contact for users who are accessing a repository. However, repository owners often fail to provide a high-quality description; instead, they use vague terms, the purpose of the repository is poorly explained, or the description is omitted entirely. In this work, we examine the current practice of writing GitHub repository descriptions. Our investigation leads to the proposal of the LSP (Language, Software technology, and Purpose) template to formulate good descriptions for GitHub repositories that are clear, concise, and informative. To understand the extent to which current automated techniques can support generating repository descriptions, we compare the performance of state-of-the-art text summarization methods on this task. Finally, our user study with GitHub users reveals that automated summarization can adequately be used for default description generation for GitHub repositories, while the descriptions which follow the LSP template offer the most effective instrument for communicating with GitHub users.
DoMoBOT: An AI-Empowered Bot for Automated and Interactive Domain Modelling
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Domain modelling transforms informal requirements written in natural language in the form of problem descriptions into concise and analyzabl… (see more)e domain models. As the manual construction of these domain models is often time-consuming, error-prone, and labor-intensive, several approaches already exist to automate domain modelling. However, the current approaches suffer from lower accuracy of extracted domain models and the lack of support for system-modeller interactions. To better assist modellers, we introduce DoMoBOT, a web-based Domain Modelling BOT. Our proposed bot combines artificial intelligence techniques such as natural language processing and machine learning to extract domain models with higher accuracy. More importantly, our bot incorporates a set of features to bring synergy between automated model extraction and bot-modeller interactions. During these interactions, the bot presents multiple possible solutions to a modeller for modelling scenarios present in a given problem description. The bot further enables modellers to switch to a particular solution and updates the other parts of the domain model proactively. In this tool demo paper, we demonstrate how the implementation and architecture of DoMoBOT support the paradigm of automated and interactive domain modelling for assisting modellers.
Automated Traceability for Domain Modelling Decisions Empowered by Artificial Intelligence
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Domain modelling abstracts real-world entities and their relationships in the form of class diagrams for a given domain problem space. Model… (see more)lers often perform domain modelling to reduce the gap between understanding the problem description which expresses requirements in natural language and the concise interpretation of these requirements. However, the manual practice of domain modelling is both time-consuming and error-prone. These issues are further aggravated when problem descriptions are long, which makes it hard to trace modelling decisions from domain models to problem descriptions or vice-versa leading to completeness and conciseness issues. Automated support for tracing domain modelling decisions in both directions is thus advantageous. In this paper, we propose an automated approach that uses artificial intelligence techniques to extract domain models along with their trace links. We present a traceability information model to enable traceability of modelling decisions in both directions and provide its proof-of-concept in the form of a tool. The evaluation on a set of unseen problem descriptions shows that our approach is promising with an overall median F2 score of 82.04%. We conduct an exploratory user study to assess the benefits and limitations of our approach and present the lessons learned from this study.
DoMoBOT: A Modelling Bot for Automated and Traceable Domain Modelling
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
In the initial phases of the software development cycle, domain modelling is typically performed to transform informal requirements expresse… (see more)d in natural language into concise and analyzable domain models. These models capture the key concepts of an application domain and their relationships in the form of class diagrams. Building domain models manually is often a time-consuming and labor-intensive task. The current approaches which aim to extract domain models automatically, are inadequate in providing insights into the modelling decisions taken by extractor systems. This inhibits modellers to quickly confirm the completeness and conciseness of extracted domain models. To address these challenges, we present DoMoBOT, a domain modelling bot that uses a traceability knowledge graph to enable traceability of modelling decisions from extracted domain model elements to requirements and vice-versa. In this tool demo paper, we showcase how the implementation and architecture of DoMoBOT facilitate modellers to extract domain models and gain insights into the modelling decisions taken by our bot.
Facilitating Asynchronous Participatory Design of Open Source Software: Bringing End Users into the Loop
Jazlyn Hellman
Jinghui Cheng
Science-Software Linkage: The Challenges of Traceability between Scientific Knowledge and Software Artifacts
Hideaki Hata
Raula Gaikovina Kula
Christoph Treude
Although computer science papers are often accompanied by software artifacts, connecting research papers to their software artifacts and vic… (see more)e versa is not always trivial. First of all, there is a lack of well-accepted standards for how such links should be provided. Furthermore, the provided links, if any, often become outdated: they are affected by link rot when pre-prints are removed, when repositories are migrated, or when papers and repositories evolve independently. In this paper, we summarize the state of the practice of linking research papers and associated source code, highlighting the recent efforts towards creating and maintaining such links. We also report on the results of several empirical studies focusing on the relationship between scientific papers and associated software artifacts, and we outline challenges related to traceability and opportunities for overcoming these challenges.
Issue Link Label Recovery and Prediction for Open Source Software
Alexander Nicholson
Guo Jin L.C.
Modern open source software development heavily relies on the issue tracking systems to manage their feature requests, bug reports, tasks, a… (see more)nd other similar artifacts. Together, those “issues” form a complex network with links to each other. The heterogeneous character of issues inherently results in varied link types and therefore poses a great challenge for users to create and maintain the label of the link manually. The goal of most existing automated issue link construction techniques ceases with only examining the existence of links between issues. In this work, we focus on the next important question of whether we can assess the type of issue link automatically through a data-driven method. We analyze the links between issues and their labels used the issue tracking system for 66 open source projects. Using three projects, we demonstrate promising results when using supervised machine learning classification for the task of link label recovery with careful model selection and tuning, achieving F1 scores of between 0.56-0.70 for the three studied projects. Further, the performance of our method for future link label prediction is convincing when there is sufficient historical data. Our work signifies the first step in systematically manage and maintain issue links faced in practice.
DoMoBOT: a bot for automated and interactive domain modelling
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Domain modelling transforms domain problem descriptions written in natural language (NL) into analyzable and concise domain models (class di… (see more)agrams) during requirements analysis or the early stages of design in software development. Since the practice of domain modelling requires time in addition to modelling skills and experience, several approaches have been proposed to automate or semi-automate the construction of domain models from problem descriptions expressed in NL. Despite the existing work on domain model extraction, some significant challenges remain unaddressed: (i) the extracted domain models are not accurate enough to be used directly or with minor modifications in software development, (ii) existing approaches do not facilitate the tracing of the rationale behind the modelling decisions taken by the model extractor, and (iii) existing approaches do not provide interactive interfaces to update the extracted domain models. Therefore, in this paper, we introduce a domain modelling bot called DoMoBOT, explain its architecture, and implement it in the form of a web-based prototype tool. The bot automatically extracts a domain model from a problem description written in NL with an accuracy higher than existing approaches. Furthermore, the bot enables modellers to update a part of the extracted domain model and in response the bot re-configures the other parts of the domain model pro-actively. To improve the accuracy of extracted domain models, we combine the techniques of Natural Language Processing and Machine Learning. Finally, we evaluate the accuracy of the extracted domain models.
Traceability Network Analysis: A Case Study of Links in Issue Tracking Systems
Alexander Nicholson
Deeksha M. Arya
Traceability links between software artifacts serve as an invaluable resource for reasoning about software products and their development pr… (see more)ocess. Most conventional methods for capturing traceability are based on pair-wise artifact relations such as trace matrices or navigable links between two directly related artifacts. However, this limited view of trace links ignores the propagating effect of artifact connections as well as the trace link properties at a project level. In this work, we propose the use of network structures to provide another perspective from which reasoning on a collective of trace events is possible. We explore various network analysis techniques in the issue tracking system of sixty-six open source projects. Our observation reveals two salient properties of the traceability network, i.e. scale free and triadic closure. These properties provide a strong indication of the applicability of network analysis tools and can be used to identify and examine important "hub" issues. As a stepping stone, these properties can further support project status analysis and link type prediction. As a proof-of-concept, we demonstrate the effectiveness of applying the triadic closure property to link type prediction.
A Neural Network Based Approach to Domain Modelling Relationships and Patterns Recognition
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Model-Driven Software Engineering advocates the use of models and their transformations across different stages of software engineering to b… (see more)etter understand and analyze systems under development. Domain modelling is used during requirements analysis or the early stages of design to transform informal requirements written in natural language to domain models which are analyzable and more concise. Since domain modelling is time-consuming and requires modelling skills and experience, many approaches have been proposed to extract domain concepts and relationships automatically using extraction rules. However, relationships and patterns are often hidden in the sentences of a problem description. Automatic recognition of relationships or patterns in those cases requires context information and external knowledge of participating domain concepts, which goes beyond what is possible with extraction rules. In this paper, we draw on recent work on domain model extraction and envision a novel technique where sentence boundaries are customized and clusters of sentences are created for domain concepts. The technique further exploits a BiLSTM neural network model to identify relationships and patterns among domain concepts. We also present a classification strategy for relationships and patterns and use it to instantiate our technique. Preliminary results indicate that this novel idea is promising and warrants further research.
Information correspondence between types of documentation for APIs
Deeksha M. Arya
Martin P. Robillard