Portrait de Jin Guo

Jin Guo

Membre académique associé
Professeur adjoint, McGill University, École d'informatique
Sujets de recherche
Recherche d'information
Traitement du langage naturel

Biographie

Jin L.C. Guo a obtenu son doctorat à l'Université de Notre Dame. Elle s'intéresse à l'utilisation des techniques d'intelligence artificielle pour résoudre des problèmes de génie logiciel. Ses recherches récentes portent sur la connaissance du domaine minier à partir des données de traçabilité logicielle et sur l'utilisation de ces connaissances pour faciliter les tâches automatisées de génie logiciel telles que la recherche de traces et les questions et réponses sur les projets. Avant son doctorat, elle a travaillé au laboratoire de recherche de Fuji Xerox dans les domaines du traitement de l'image et de la vision par ordinateur.

Étudiants actuels

Maîtrise recherche - McGill
Co-superviseur⋅e :
Postdoctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Maîtrise recherche - McGill

Publications

Historical Issue Data of Projects on Jira
A. Nicholson
Deeksha M. Arya
Material for IEEE Software paper "How Do Open Source Software Contributors Perceive and Address Usability?"
Wenting Wang
Jinghui Cheng
Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs
Kian Ahrabian
Danny Tarlow
Hehuimin Cheng
We present a multi-relational temporal Knowledge Graph based on the daily interactions between artifacts in GitHub, one of the largest socia… (voir plus)l coding platforms. Such representation enables posing many user-activity and project management questions as link prediction and time queries over the knowledge graph. In particular, we introduce two new datasets for i) interpolated time-conditioned link prediction and ii) extrapolated time-conditioned link/time prediction queries, each with distinguished properties. Our experiments on these datasets highlight the potential of adapting knowledge graphs to answer broad software engineering questions. Meanwhile, it also reveals the unsatisfactory performance of existing temporal models on extrapolated queries and time prediction queries in general. To overcome these shortcomings, we introduce an extension to current temporal models using relative temporal information with regards to past events.
ArguLens: Anatomy of Community Opinions On Usability Issues Using Argumentation Models
Wenting Wang
Deeksha M. Arya
Nicole Novielli
Jinghui Cheng
In open-source software (OSS), the design of usability is often influenced by the discussions among community members on platforms such as i… (voir plus)ssue tracking systems (ITSs). However, digesting the rich information embedded in issue discussions can be a major challenge due to the vast number and diversity of the comments. We propose and evaluate ArguLens, a conceptual framework and automated technique leveraging an argumentation model to support effective understanding and consolidation of community opinions in ITSs. Through content analysis, we anatomized highly discussed usability issues from a large, active OSS project, into their argumentation components and standpoints. We then experimented with supervised machine learning techniques for automated argument extraction. Finally, through a study with experienced ITS users, we show that the information provided by ArguLens supported the digestion of usability-related opinions and facilitated the review of lengthy issues. ArguLens provides the direction of designing valuable tools for high-level reasoning and effective discussion about usability.
GitHub Repositories with Links to Academic Papers: Open Access, Traceability, and Evolution
Supatsara Wattanakriengkrai
Bodin Chinthanet
Hideaki Hata
Raula Gaikovina Kula
Christoph Treude
Ken-ichi Matsumoto
Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of Open Source Softwar… (voir plus)e implements bleeding edge science into its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the link impact remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conducted a large-scale study of 20 thousand GitHub repositories to establish prevalence of references to academic papers. We use a mixed-methods approach to identify Open Access (OA), traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are OA. In terms of traceability, our analysis revealed that machine learning is the most prevalent topic of repositories. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. A case study of referenced arXiv paper shows that most of these papers are high-impact and influential and do align with academia, referenced by repositories written in different programming languages. From the evolutionary aspect, we find very few changes of papers being referenced and links to them.
SENET: A Semantic Web for Supporting Automation of Software Engineering Tasks
Yalin Liu
Jinfeng Lin
Jane Cleland-Huang
Michael Vierhauser
Sugandha Lohar
The use of Natural Language (NL) interfaces to allow devices and applications to respond to verbal commands or free-form textual queries is … (voir plus)becoming increasingly prevalent in our society. To a large extent, their success in interpreting and responding to a request is dependent upon rich underlying ontologies and conceptual models that understand the technical or domain specific vocabulary of diverse users. The effective use of NL interfaces in the Software Engineering (SE) domains requires its own ontology models focusing upon software related terms and concepts. While many SE glossaries exist, they are often incomplete and tend to define the vocabulary for specific sub-fields without capturing associations between terms and phrases. This limits their usefulness for supporting NL-related tasks. In this paper we propose an approach for constructing and evolving a semantic network of software engineering concepts and phrases. Our approach starts with a set of existing SE glossaries, uses the existing glossary terms and explicitly defined associations as a starting point, uses machine learning-based techniques to dynamically identify and document additional associations between terms, leverages the network to interpret NL queries in the SE domain, and finally augments the resulting semantic network with feedback provided by users. We evaluate the viability of our approach within the sub-domain of Agile Software Development, focusing on requirements related queries, and show that the semantic network enhances the ability of an NL interface to correctly interpret and execute user queries.
Towards Queryable and Traceable Domain Models
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Model-Driven Software Engineering encompasses various modelling formalisms for supporting software development. One such formalism is domain… (voir plus) modelling which bridges the gap between requirements expressed in natural language and analyzable and more concise domain models expressed in class diagrams. Due to the lack of modelling skills among novice modellers and time constraints in industrial projects, it is often not possible to build an accurate domain model manually. To address this challenge, we aim to develop an approach to extract domain models from problem descriptions written in natural language by combining rules based on natural language processing with machine learning. As a first step, we report on an automated and tool-supported approach with an accuracy of extracted domain models higher than existing approaches. In addition, the approach generates trace links for each model element of a domain model. The trace links enable novice modellers to execute queries on the extracted domain models to gain insights into the modelling decisions taken for improving their modelling skills. Furthermore, to evaluate our approach, we propose a novel comparison metric and discuss our experimental design. Finally, we present a research agenda detailing research directions and discuss corresponding challenges.
SST'19 - Software and Systems Traceability
Jan-Philipp Steghöfer
Nan Niu
Anas Mahmoud
Traceability is the ability to relate di erent artifacts during the development and operation of a system to each other. It enables program … (voir plus)comprehension, change impact analysis, and facilitates the cooperation of engineers from di erent disciplines. The 10th International Workshop on Software and Systems Traceability (former International Workshop on Traceability in Emerging Forms of Software Engineering, TEFSE), explored the role and impact of traceability in modern software and systems development. The event brought together researchers and practitioners to examine the challenges of recovering, maintaining, and utilizing traceability for the myriad forms of software and systems engineering artifacts. SST'19 was a highly interactive working event focused on discussing the main problems related to software traceability in particular in the context of opportunities and challenges posed by the recent progress in Arti cial Intelligence techniques and proposing possible solutions for such problems.
Teaching Modelling Literacy: An Artificial Intelligence Approach
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
In Model-Driven Engineering (MDE), models are used to build and analyze complex systems. In the last decades, different modelling formalisms… (voir plus) have been proposed for supporting software development. However, their adoption and practice strongly rely on mastering essential modelling skills to develop a complete and coherent model-based system. Moreover, it is often difficult for novice modellers to get direct and timely feedback and recommendations on their modelling strategies and decisions, particularly in large classroom settings which hinders their learning. Certainly, there is an opportunity to apply Artificial Intelligence (AI) techniques to an MDE learning environment to empower the provisioning of automated and intelligent modelling advocacy. In this paper, we propose a framework called ModBud (a modelling buddy) to educate novice modellers about the art of abstraction. ModBud uses natural language processing (NLP) and machine learning (ML) to create modelling bots with the aim of improving the modelling skills of novice modellers and assisting other practitioners, too. These bots could be used to support teaching with automatic creation or grading of models and enhance learning beyond the traditional classroom-based MDE education with timely feedback and personalized tutoring. Research challenges for the proposed framework are discussed and a research roadmap is presented.
Activity-Based Analysis of Open Source Software Contributors: Roles and Dynamics
Jinghui Cheng
Contributors to open source software (OSS) communities assume diverse roles to take different responsibilities. One major limitation of the … (voir plus)current OSS tools and platforms is that they provide a uniform user interface regardless of the activities performed by the various types of contributors. This paper serves as a non-trivial first step towards resolving this challenge by demonstrating a methodology and establishing knowledge to understand how the contributors' roles and their dynamics, reflected in the activities contributors perform, are exhibited in OSS communities. Based on an analysis of user action data from 29 GitHub projects, we extracted six activities that distinguished four Active roles and five Supporting roles of OSS contributors, as well as patterns in role changes. Through the lens of the Activity Theory, these findings provided rich design guidelines for OSS tools to support diverse contributor roles.
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Deeksha M. Arya
Wenting Wang
Jinghui Cheng
Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these co… (voir plus)mments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.
Usability of Virtual Reality Application Through the Lens of the User Community: A Case Study
Wenting Wang
Jinghui Cheng
The increasing availability and diversity of virtual reality (VR) applications highlighted the importance of their usability. Function-orien… (voir plus)ted VR applications posed new challenges that are not well studied in the literature. Moreover, user feedback becomes readily available thanks to modern software engineering tools, such as app stores and open source platforms. Using Firefox Reality as a case study, we explored the major types of VR usability issues raised in these platforms. We found that 77% of usability feedbacks can be mapped to Nielsen's heuristics while few were mappable to VR-specific heuristics. This result indicates that Nielsen's heuristics could potentially help developers address the usability of this VR application in its early development stage. This work paves the road for exploring tools leveraging the community effort to promote the usability of function-oriented VR applications.