Portrait of Jin Guo

Jin Guo

Associate Academic Member
Assistant Professor, McGill University, School of Computer Science

Biography

Jin L.C. Guo is an assistant professor at the School of Computer Science, McGill University.

She is interested in using AI techniques to solve software engineering problems. Her recent research focuses on mining domain knowledge from software traceability data and using such knowledge to facilitate automated SE tasks, such as trace retrieval and project Q&A.

Guo completed her PhD at the University of Notre Dame. Prior to that, she worked on image processing and computer vision in Fuji Xerox’s research lab.

Current Students

Postdoctorate - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University

Publications

Automated Traceability for Domain Modelling Decisions Empowered by Artificial Intelligence
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Domain modelling abstracts real-world entities and their relationships in the form of class diagrams for a given domain problem space. Model… (see more)lers often perform domain modelling to reduce the gap between understanding the problem description which expresses requirements in natural language and the concise interpretation of these requirements. However, the manual practice of domain modelling is both time-consuming and error-prone. These issues are further aggravated when problem descriptions are long, which makes it hard to trace modelling decisions from domain models to problem descriptions or vice-versa leading to completeness and conciseness issues. Automated support for tracing domain modelling decisions in both directions is thus advantageous. In this paper, we propose an automated approach that uses artificial intelligence techniques to extract domain models along with their trace links. We present a traceability information model to enable traceability of modelling decisions in both directions and provide its proof-of-concept in the form of a tool. The evaluation on a set of unseen problem descriptions shows that our approach is promising with an overall median F2 score of 82.04%. We conduct an exploratory user study to assess the benefits and limitations of our approach and present the lessons learned from this study.
DoMoBOT: A Modelling Bot for Automated and Traceable Domain Modelling
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
In the initial phases of the software development cycle, domain modelling is typically performed to transform informal requirements expresse… (see more)d in natural language into concise and analyzable domain models. These models capture the key concepts of an application domain and their relationships in the form of class diagrams. Building domain models manually is often a time-consuming and labor-intensive task. The current approaches which aim to extract domain models automatically, are inadequate in providing insights into the modelling decisions taken by extractor systems. This inhibits modellers to quickly confirm the completeness and conciseness of extracted domain models. To address these challenges, we present DoMoBOT, a domain modelling bot that uses a traceability knowledge graph to enable traceability of modelling decisions from extracted domain model elements to requirements and vice-versa. In this tool demo paper, we showcase how the implementation and architecture of DoMoBOT facilitate modellers to extract domain models and gain insights into the modelling decisions taken by our bot.
Facilitating Asynchronous Participatory Design of Open Source Software: Bringing End Users into the Loop
Jazlyn Hellman
Jinghui Cheng
Science-Software Linkage: The Challenges of Traceability between Scientific Knowledge and Software Artifacts
Hideaki Hata
Raula Gaikovina Kula
Christoph Treude
Although computer science papers are often accompanied by software artifacts, connecting research papers to their software artifacts and vic… (see more)e versa is not always trivial. First of all, there is a lack of well-accepted standards for how such links should be provided. Furthermore, the provided links, if any, often become outdated: they are affected by link rot when pre-prints are removed, when repositories are migrated, or when papers and repositories evolve independently. In this paper, we summarize the state of the practice of linking research papers and associated source code, highlighting the recent efforts towards creating and maintaining such links. We also report on the results of several empirical studies focusing on the relationship between scientific papers and associated software artifacts, and we outline challenges related to traceability and opportunities for overcoming these challenges.
Issue Link Label Recovery and Prediction for Open Source Software
Alexander Nicholson
Guo Jin L.C.
Modern open source software development heavily relies on the issue tracking systems to manage their feature requests, bug reports, tasks, a… (see more)nd other similar artifacts. Together, those “issues” form a complex network with links to each other. The heterogeneous character of issues inherently results in varied link types and therefore poses a great challenge for users to create and maintain the label of the link manually. The goal of most existing automated issue link construction techniques ceases with only examining the existence of links between issues. In this work, we focus on the next important question of whether we can assess the type of issue link automatically through a data-driven method. We analyze the links between issues and their labels used the issue tracking system for 66 open source projects. Using three projects, we demonstrate promising results when using supervised machine learning classification for the task of link label recovery with careful model selection and tuning, achieving F1 scores of between 0.56-0.70 for the three studied projects. Further, the performance of our method for future link label prediction is convincing when there is sufficient historical data. Our work signifies the first step in systematically manage and maintain issue links faced in practice.
DoMoBOT: a bot for automated and interactive domain modelling
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Domain modelling transforms domain problem descriptions written in natural language (NL) into analyzable and concise domain models (class di… (see more)agrams) during requirements analysis or the early stages of design in software development. Since the practice of domain modelling requires time in addition to modelling skills and experience, several approaches have been proposed to automate or semi-automate the construction of domain models from problem descriptions expressed in NL. Despite the existing work on domain model extraction, some significant challenges remain unaddressed: (i) the extracted domain models are not accurate enough to be used directly or with minor modifications in software development, (ii) existing approaches do not facilitate the tracing of the rationale behind the modelling decisions taken by the model extractor, and (iii) existing approaches do not provide interactive interfaces to update the extracted domain models. Therefore, in this paper, we introduce a domain modelling bot called DoMoBOT, explain its architecture, and implement it in the form of a web-based prototype tool. The bot automatically extracts a domain model from a problem description written in NL with an accuracy higher than existing approaches. Furthermore, the bot enables modellers to update a part of the extracted domain model and in response the bot re-configures the other parts of the domain model pro-actively. To improve the accuracy of extracted domain models, we combine the techniques of Natural Language Processing and Machine Learning. Finally, we evaluate the accuracy of the extracted domain models.
Traceability Network Analysis: A Case Study of Links in Issue Tracking Systems
Alexander Nicholson
Deeksha M. Arya
Traceability links between software artifacts serve as an invaluable resource for reasoning about software products and their development pr… (see more)ocess. Most conventional methods for capturing traceability are based on pair-wise artifact relations such as trace matrices or navigable links between two directly related artifacts. However, this limited view of trace links ignores the propagating effect of artifact connections as well as the trace link properties at a project level. In this work, we propose the use of network structures to provide another perspective from which reasoning on a collective of trace events is possible. We explore various network analysis techniques in the issue tracking system of sixty-six open source projects. Our observation reveals two salient properties of the traceability network, i.e. scale free and triadic closure. These properties provide a strong indication of the applicability of network analysis tools and can be used to identify and examine important "hub" issues. As a stepping stone, these properties can further support project status analysis and link type prediction. As a proof-of-concept, we demonstrate the effectiveness of applying the triadic closure property to link type prediction.
A Neural Network Based Approach to Domain Modelling Relationships and Patterns Recognition
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Model-Driven Software Engineering advocates the use of models and their transformations across different stages of software engineering to b… (see more)etter understand and analyze systems under development. Domain modelling is used during requirements analysis or the early stages of design to transform informal requirements written in natural language to domain models which are analyzable and more concise. Since domain modelling is time-consuming and requires modelling skills and experience, many approaches have been proposed to extract domain concepts and relationships automatically using extraction rules. However, relationships and patterns are often hidden in the sentences of a problem description. Automatic recognition of relationships or patterns in those cases requires context information and external knowledge of participating domain concepts, which goes beyond what is possible with extraction rules. In this paper, we draw on recent work on domain model extraction and envision a novel technique where sentence boundaries are customized and clusters of sentences are created for domain concepts. The technique further exploits a BiLSTM neural network model to identify relationships and patterns among domain concepts. We also present a classification strategy for relationships and patterns and use it to instantiate our technique. Preliminary results indicate that this novel idea is promising and warrants further research.
Information correspondence between types of documentation for APIs
Deeksha M. Arya
Martin P. Robillard
Historical Issue Data of Projects on Jira
A. Nicholson
Deeksha M. Arya
Material for IEEE Software paper "How Do Open Source Software Contributors Perceive and Address Usability?"
Wenting Wang
Jinghui Cheng
Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs
Kian Ahrabian
Danny Tarlow
Hehuimin Cheng
We present a multi-relational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest socia… (see more)l coding platforms. Such representation enables posing many user-activity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated time-conditioned link prediction and ii) extrapolated time-conditioned link/time prediction queries, each with distinguished properties. Our experiments on these datasets highlight the potential of adapting knowledge graphs to answer broad software engineering questions. Meanwhile, it also reveals the unsatisfactory performance of existing temporal models on extrapolated queries and time prediction queries in general. To overcome these shortcomings, we introduce an extension to current temporal models using relative temporal information with regards to past events.