Portrait of Jin Guo

Jin Guo

Associate Academic Member
Assistant Professor, McGill University, School of Computer Science
Research Topics
Human-AI interaction
Human-Centered AI
Human-Computer Interaction (HCI)
Privacy
Responsible AI

Biography

Jin L.C. Guo is an assistant professor at the School of Computer Science, McGill University.

She is interested in using AI techniques to solve software engineering problems. Her recent research focuses on mining domain knowledge from software traceability data and using such knowledge to facilitate automated SE tasks, such as trace retrieval and project Q&A.

Guo completed her PhD at the University of Notre Dame. Prior to that, she worked on image processing and computer vision in Fuji Xerox’s research lab.

Current Students

PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Postdoctorate - McGill University
Co-supervisor :
Master's Research - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
Master's Research - McGill University

Publications

Science-Software Linkage: The Challenges of Traceability between Scientific Knowledge and Software Artifacts
Hideaki Hata
Raula Gaikovina Kula
Christoph Treude
Although computer science papers are often accompanied by software artifacts, connecting research papers to their software artifacts and vic… (see more)e versa is not always trivial. First of all, there is a lack of well-accepted standards for how such links should be provided. Furthermore, the provided links, if any, often become outdated: they are affected by link rot when pre-prints are removed, when repositories are migrated, or when papers and repositories evolve independently. In this paper, we summarize the state of the practice of linking research papers and associated source code, highlighting the recent efforts towards creating and maintaining such links. We also report on the results of several empirical studies focusing on the relationship between scientific papers and associated software artifacts, and we outline challenges related to traceability and opportunities for overcoming these challenges.
Issue Link Label Recovery and Prediction for Open Source Software
Alexander Nicholson
Guo Jin L.C.
Modern open source software development heavily relies on the issue tracking systems to manage their feature requests, bug reports, tasks, a… (see more)nd other similar artifacts. Together, those “issues” form a complex network with links to each other. The heterogeneous character of issues inherently results in varied link types and therefore poses a great challenge for users to create and maintain the label of the link manually. The goal of most existing automated issue link construction techniques ceases with only examining the existence of links between issues. In this work, we focus on the next important question of whether we can assess the type of issue link automatically through a data-driven method. We analyze the links between issues and their labels used the issue tracking system for 66 open source projects. Using three projects, we demonstrate promising results when using supervised machine learning classification for the task of link label recovery with careful model selection and tuning, achieving F1 scores of between 0.56-0.70 for the three studied projects. Further, the performance of our method for future link label prediction is convincing when there is sufficient historical data. Our work signifies the first step in systematically manage and maintain issue links faced in practice.
DoMoBOT: a bot for automated and interactive domain modelling
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Domain modelling transforms domain problem descriptions written in natural language (NL) into analyzable and concise domain models (class di… (see more)agrams) during requirements analysis or the early stages of design in software development. Since the practice of domain modelling requires time in addition to modelling skills and experience, several approaches have been proposed to automate or semi-automate the construction of domain models from problem descriptions expressed in NL. Despite the existing work on domain model extraction, some significant challenges remain unaddressed: (i) the extracted domain models are not accurate enough to be used directly or with minor modifications in software development, (ii) existing approaches do not facilitate the tracing of the rationale behind the modelling decisions taken by the model extractor, and (iii) existing approaches do not provide interactive interfaces to update the extracted domain models. Therefore, in this paper, we introduce a domain modelling bot called DoMoBOT, explain its architecture, and implement it in the form of a web-based prototype tool. The bot automatically extracts a domain model from a problem description written in NL with an accuracy higher than existing approaches. Furthermore, the bot enables modellers to update a part of the extracted domain model and in response the bot re-configures the other parts of the domain model pro-actively. To improve the accuracy of extracted domain models, we combine the techniques of Natural Language Processing and Machine Learning. Finally, we evaluate the accuracy of the extracted domain models.
Traceability Network Analysis: A Case Study of Links in Issue Tracking Systems
Alexander Nicholson
Deeksha M. Arya
Traceability links between software artifacts serve as an invaluable resource for reasoning about software products and their development pr… (see more)ocess. Most conventional methods for capturing traceability are based on pair-wise artifact relations such as trace matrices or navigable links between two directly related artifacts. However, this limited view of trace links ignores the propagating effect of artifact connections as well as the trace link properties at a project level. In this work, we propose the use of network structures to provide another perspective from which reasoning on a collective of trace events is possible. We explore various network analysis techniques in the issue tracking system of sixty-six open source projects. Our observation reveals two salient properties of the traceability network, i.e. scale free and triadic closure. These properties provide a strong indication of the applicability of network analysis tools and can be used to identify and examine important "hub" issues. As a stepping stone, these properties can further support project status analysis and link type prediction. As a proof-of-concept, we demonstrate the effectiveness of applying the triadic closure property to link type prediction.
A Neural Network Based Approach to Domain Modelling Relationships and Patterns Recognition
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Model-Driven Software Engineering advocates the use of models and their transformations across different stages of software engineering to b… (see more)etter understand and analyze systems under development. Domain modelling is used during requirements analysis or the early stages of design to transform informal requirements written in natural language to domain models which are analyzable and more concise. Since domain modelling is time-consuming and requires modelling skills and experience, many approaches have been proposed to extract domain concepts and relationships automatically using extraction rules. However, relationships and patterns are often hidden in the sentences of a problem description. Automatic recognition of relationships or patterns in those cases requires context information and external knowledge of participating domain concepts, which goes beyond what is possible with extraction rules. In this paper, we draw on recent work on domain model extraction and envision a novel technique where sentence boundaries are customized and clusters of sentences are created for domain concepts. The technique further exploits a BiLSTM neural network model to identify relationships and patterns among domain concepts. We also present a classification strategy for relationships and patterns and use it to instantiate our technique. Preliminary results indicate that this novel idea is promising and warrants further research.
Information correspondence between types of documentation for APIs
Deeksha M. Arya
Martin P. Robillard
Historical Issue Data of Projects on Jira
A. Nicholson
Deeksha M. Arya
Material for IEEE Software paper "How Do Open Source Software Contributors Perceive and Address Usability?"
Wenting Wang
Jinghui Cheng
Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs
Kian Ahrabian
Daniel Tarlow
Hehuimin Cheng
We present a multi-relational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest socia… (see more)l coding platforms. Such representation enables posing many user-activity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated time-conditioned link prediction and ii) extrapolated time-conditioned link/time prediction queries, each with distinguished properties. Our experiments on these datasets highlight the potential of adapting knowledge graphs to answer broad software engineering questions. Meanwhile, it also reveals the unsatisfactory performance of existing temporal models on extrapolated queries and time prediction queries in general. To overcome these shortcomings, we introduce an extension to current temporal models using relative temporal information with regards to past events.
ArguLens: Anatomy of Community Opinions On Usability Issues Using Argumentation Models
Wenting Wang
Deeksha M. Arya
Nicole Novielli
Jinghui Cheng
In open-source software (OSS), the design of usability is often influenced by the discussions among community members on platforms such as i… (see more)ssue tracking systems (ITSs). However, digesting the rich information embedded in issue discussions can be a major challenge due to the vast number and diversity of the comments. We propose and evaluate ArguLens, a conceptual framework and automated technique leveraging an argumentation model to support effective understanding and consolidation of community opinions in ITSs. Through content analysis, we anatomized highly discussed usability issues from a large, active OSS project, into their argumentation components and standpoints. We then experimented with supervised machine learning techniques for automated argument extraction. Finally, through a study with experienced ITS users, we show that the information provided by ArguLens supported the digestion of usability-related opinions and facilitated the review of lengthy issues. ArguLens provides the direction of designing valuable tools for high-level reasoning and effective discussion about usability.
GitHub Repositories with Links to Academic Papers: Open Access, Traceability, and Evolution
Supatsara Wattanakriengkrai
Bodin Chinthanet
Hideaki Hata
Raula Gaikovina Kula
Christoph Treude
Ken-ichi Matsumoto
Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of Open Source Softwar… (see more)e implements bleeding edge science into its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the link impact remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conducted a large-scale study of 20 thousand GitHub repositories to establish prevalence of references to academic papers. We use a mixed-methods approach to identify Open Access (OA), traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are OA. In terms of traceability, our analysis revealed that machine learning is the most prevalent topic of repositories. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. A case study of referenced arXiv paper shows that most of these papers are high-impact and influential and do align with academia, referenced by repositories written in different programming languages. From the evolutionary aspect, we find very few changes of papers being referenced and links to them.
SENET: A Semantic Web for Supporting Automation of Software Engineering Tasks
Yalin Liu
Jinfeng Lin
Jane Cleland-Huang
Michael Vierhauser
Sugandha Lohar
The use of Natural Language (NL) interfaces to allow devices and applications to respond to verbal commands or free-form textual queries is … (see more)becoming increasingly prevalent in our society. To a large extent, their success in interpreting and responding to a request is dependent upon rich underlying ontologies and conceptual models that understand the technical or domain specific vocabulary of diverse users. The effective use of NL interfaces in the Software Engineering (SE) domains requires its own ontology models focusing upon software related terms and concepts. While many SE glossaries exist, they are often incomplete and tend to define the vocabulary for specific sub-fields without capturing associations between terms and phrases. This limits their usefulness for supporting NL-related tasks. In this paper we propose an approach for constructing and evolving a semantic network of software engineering concepts and phrases. Our approach starts with a set of existing SE glossaries, uses the existing glossary terms and explicitly defined associations as a starting point, uses machine learning-based techniques to dynamically identify and document additional associations between terms, leverages the network to interpret NL queries in the SE domain, and finally augments the resulting semantic network with feedback provided by users. We evaluate the viability of our approach within the sub-domain of Agile Software Development, focusing on requirements related queries, and show that the semantic network enhances the ability of an NL interface to correctly interpret and execute user queries.