Portrait of Jin Guo

Jin Guo

Associate Academic Member
Assistant Professor, McGill University, School of Computer Science
Research Topics
Human-AI interaction
Human-Centered AI
Human-Computer Interaction (HCI)
Privacy
Responsible AI

Biography

Jin L.C. Guo is an assistant professor at the School of Computer Science, McGill University.

She is interested in using AI techniques to solve software engineering problems. Her recent research focuses on mining domain knowledge from software traceability data and using such knowledge to facilitate automated SE tasks, such as trace retrieval and project Q&A.

Guo completed her PhD at the University of Notre Dame. Prior to that, she worked on image processing and computer vision in Fuji Xerox’s research lab.

Current Students

PhD - McGill University
Master's Research - McGill University
Co-supervisor :
Postdoctorate - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
Master's Research - McGill University

Publications

Historical Issue Data of Projects on Jira
A. Nicholson
Deeksha M. Arya
Material for IEEE Software paper "How Do Open Source Software Contributors Perceive and Address Usability?"
Wenting Wang
Jinghui Cheng
Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs
Kian Ahrabian
Danny Tarlow
Hehuimin Cheng
We present a multi-relational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest socia… (see more)l coding platforms. Such representation enables posing many user-activity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated time-conditioned link prediction and ii) extrapolated time-conditioned link/time prediction queries, each with distinguished properties. Our experiments on these datasets highlight the potential of adapting knowledge graphs to answer broad software engineering questions. Meanwhile, it also reveals the unsatisfactory performance of existing temporal models on extrapolated queries and time prediction queries in general. To overcome these shortcomings, we introduce an extension to current temporal models using relative temporal information with regards to past events.
ArguLens: Anatomy of Community Opinions On Usability Issues Using Argumentation Models
Wenting Wang
Deeksha M. Arya
Nicole Novielli
Jinghui Cheng
In open-source software (OSS), the design of usability is often influenced by the discussions among community members on platforms such as i… (see more)ssue tracking systems (ITSs). However, digesting the rich information embedded in issue discussions can be a major challenge due to the vast number and diversity of the comments. We propose and evaluate ArguLens, a conceptual framework and automated technique leveraging an argumentation model to support effective understanding and consolidation of community opinions in ITSs. Through content analysis, we anatomized highly discussed usability issues from a large, active OSS project, into their argumentation components and standpoints. We then experimented with supervised machine learning techniques for automated argument extraction. Finally, through a study with experienced ITS users, we show that the information provided by ArguLens supported the digestion of usability-related opinions and facilitated the review of lengthy issues. ArguLens provides the direction of designing valuable tools for high-level reasoning and effective discussion about usability.
GitHub Repositories with Links to Academic Papers: Open Access, Traceability, and Evolution
Supatsara Wattanakriengkrai
Bodin Chinthanet
Hideaki Hata
Raula Gaikovina Kula
Christoph Treude
Ken-ichi Matsumoto
Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of Open Source Softwar… (see more)e implements bleeding edge science into its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the link impact remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conducted a large-scale study of 20 thousand GitHub repositories to establish prevalence of references to academic papers. We use a mixed-methods approach to identify Open Access (OA), traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are OA. In terms of traceability, our analysis revealed that machine learning is the most prevalent topic of repositories. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. A case study of referenced arXiv paper shows that most of these papers are high-impact and influential and do align with academia, referenced by repositories written in different programming languages. From the evolutionary aspect, we find very few changes of papers being referenced and links to them.
SENET: A Semantic Web for Supporting Automation of Software Engineering Tasks
Yalin Liu
Jinfeng Lin
Jane Cleland-Huang
Michael Vierhauser
Sugandha Lohar
The use of Natural Language (NL) interfaces to allow devices and applications to respond to verbal commands or free-form textual queries is … (see more)becoming increasingly prevalent in our society. To a large extent, their success in interpreting and responding to a request is dependent upon rich underlying ontologies and conceptual models that understand the technical or domain specific vocabulary of diverse users. The effective use of NL interfaces in the Software Engineering (SE) domains requires its own ontology models focusing upon software related terms and concepts. While many SE glossaries exist, they are often incomplete and tend to define the vocabulary for specific sub-fields without capturing associations between terms and phrases. This limits their usefulness for supporting NL-related tasks. In this paper we propose an approach for constructing and evolving a semantic network of software engineering concepts and phrases. Our approach starts with a set of existing SE glossaries, uses the existing glossary terms and explicitly defined associations as a starting point, uses machine learning-based techniques to dynamically identify and document additional associations between terms, leverages the network to interpret NL queries in the SE domain, and finally augments the resulting semantic network with feedback provided by users. We evaluate the viability of our approach within the sub-domain of Agile Software Development, focusing on requirements related queries, and show that the semantic network enhances the ability of an NL interface to correctly interpret and execute user queries.
Towards Queryable and Traceable Domain Models
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Model-Driven Software Engineering encompasses various modelling formalisms for supporting software development. One such formalism is domain… (see more) modelling which bridges the gap between requirements expressed in natural language and analyzable and more concise domain models expressed in class diagrams. Due to the lack of modelling skills among novice modellers and time constraints in industrial projects, it is often not possible to build an accurate domain model manually. To address this challenge, we aim to develop an approach to extract domain models from problem descriptions written in natural language by combining rules based on natural language processing with machine learning. As a first step, we report on an automated and tool-supported approach with an accuracy of extracted domain models higher than existing approaches. In addition, the approach generates trace links for each model element of a domain model. The trace links enable novice modellers to execute queries on the extracted domain models to gain insights into the modelling decisions taken for improving their modelling skills. Furthermore, to evaluate our approach, we propose a novel comparison metric and discuss our experimental design. Finally, we present a research agenda detailing research directions and discuss corresponding challenges.
SST'19 - Software and Systems Traceability
Jan-Philipp Steghöfer
Nan Niu
Anas Mahmoud
Traceability is the ability to relate di erent artifacts during the development and operation of a system to each other. It enables program … (see more)comprehension, change impact analysis, and facilitates the cooperation of engineers from di erent disciplines. The 10th International Workshop on Software and Systems Traceability (former International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE), explored the role and impact of traceability in modern software and systems development. The event brought together researchers and practitioners to examine the challenges of recovering, maintaining, and utilizing traceability for the myriad forms of software and systems engineering artifacts. SST'19 was a highly interactive working event focused on discussing the main problems related to software traceability in particular in the context of opportunities and challenges posed by the recent progress in Arti cial Intelligence techniques and proposing possible solutions for such problems.
Teaching Modelling Literacy: An Artificial Intelligence Approach
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
In Model-Driven Engineering (MDE), models are used to build and analyze complex systems. In the last decades, different modelling formalisms… (see more) have been proposed for supporting software development. However, their adoption and practice strongly rely on mastering essential modelling skills to develop a complete and coherent model-based system. Moreover, it is often difficult for novice modellers to get direct and timely feedback and recommendations on their modelling strategies and decisions, particularly in large classroom settings which hinders their learning. Certainly, there is an opportunity to apply Artificial Intelligence (AI) techniques to an MDE learning environment to empower the provisioning of automated and intelligent modelling advocacy. In this paper, we propose a framework called ModBud (a modelling buddy) to educate novice modellers about the art of abstraction. ModBud uses natural language processing (NLP) and machine learning (ML) to create modelling bots with the aim of improving the modelling skills of novice modellers and assisting other practitioners, too. These bots could be used to support teaching with automatic creation or grading of models and enhance learning beyond the traditional classroom-based MDE education with timely feedback and personalized tutoring. Research challenges for the proposed framework are discussed and a research roadmap is presented.
Activity-Based Analysis of Open Source Software Contributors: Roles and Dynamics
Jinghui Cheng
Contributors to open source software (OSS) communities assume diverse roles to take different responsibilities. One major limitation of the … (see more)current OSS tools and platforms is that they provide a uniform user interface regardless of the activities performed by the various types of contributors. This paper serves as a non-trivial first step towards resolving this challenge by demonstrating a methodology and establishing knowledge to understand how the contributors' roles and their dynamics, reflected in the activities contributors perform, are exhibited in OSS communities. Based on an analysis of user action data from 29 GitHub projects, we extracted six activities that distinguished four Active roles and five Supporting roles of OSS contributors, as well as patterns in role changes. Through the lens of the Activity Theory, these findings provided rich design guidelines for OSS tools to support diverse contributor roles.
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Deeksha M. Arya
Wenting Wang
Jinghui Cheng
Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these co… (see more)mments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.
Usability of Virtual Reality Application Through the Lens of the User Community: A Case Study
Wenting Wang
Jinghui Cheng
The increasing availability and diversity of virtual reality (VR) applications highlighted the importance of their usability. Function-orien… (see more)ted VR applications posed new challenges that are not well studied in the literature. Moreover, user feedback becomes readily available thanks to modern software engineering tools, such as app stores and open source platforms. Using Firefox Reality as a case study, we explored the major types of VR usability issues raised in these platforms. We found that 77% of usability feedbacks can be mapped to Nielsen's heuristics while few were mappable to VR-specific heuristics. This result indicates that Nielsen's heuristics could potentially help developers address the usability of this VR application in its early development stage. This work paves the road for exploring tools leveraging the community effort to promote the usability of function-oriented VR applications.