Portrait of Yuyan Chen is unavailable

Yuyan Chen

PhD - McGill University
Supervisor
Research Topics
Computer Vision
Deep Learning

Publications

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring
Nico Lang
B. Christian Schmidt
Yves Basset
Sara Beery
Maxim Larrivée
Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are … (see more)changing. Indeed, some 90% of Earth's species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training -- the problem of open-set recognition (OSR) -- limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Zhihao Wen
Ge Fan
Zhengyu Chen
Wei Wu
Dayiheng Liu
Zhixu Li
Yanghua Xiao
XMQAs: Constructing Complex-Modified Question-Answering Dataset for Robust Question Understanding
Yanghua Xiao
Zhixu Li
Question understanding is an important issue to the success of a Knowledge-based Question Answering (KBQA) system.However, the existing stud… (see more)y does not pay enough attention to this issue given that the questions in the existing KBQA datasets are usually expressed in simple and straightforward way. This is not in line with the actual linguistic conventions, which often use a lot of modifiers. To facilitate the study on evaluating and enhancing the question understanding ability of the KBQA systems, this paper proposes to construct a complex-modified question-answering (XMQAs) dataset based on existing KBQA datasets. With the help of knowledge bases and dictionaries, three kinds of modifiers are defined and applied to original simple-expressed questions. These modifiers could make the expression of these questions complex without changing their semantics. Based on XMQAs, we then propose a novel question understanding algorithm upon existing KBQA models, which greatly improves the robustness of their question understanding abilities. We conduct extensive experiments on XMQAs and two widely acknowledged KBQA datasets. The empirical results demonstrate that our proposed algorithm can improve the performance of KBQA models on not only the complex-modified questions, but also simple-expressed questions.
Grow-and-Clip: Informative-yet-Concise Evidence Distillation for Answer Explanation
Yanghua Xiao
Interpreting the predictions of existing Question Answering (QA) models is critical to many real-world intelligent applications, such as QA … (see more)systems for healthcare, education, and finance. However, existing QA models lack interpretability and provide no feedback or explanation for end-users to help them understand why a specific prediction is the answer to a question. In this research, we argue that the evidences of an answer is critical to enhancing the interpretability of QA models. Unlike previous research that simply extracts several sentence(s) in the context as evidence, we are the first to explicitly define the concept of evidence as the supporting facts in a context which are informative, concise, and readable. Besides, we provide effective strategies to quantitatively measure the informativeness, conciseness and readability of evidence. Furthermore, we propose Grow-and-Clip Evidence Distillation (GCED) algorithm to extract evidences from the contexts by trade-off informativeness, conciseness, and readability. We conduct extensive experiments on the SQuAD and TriviaQA datasets with several baseline models to evaluate the effect of GCED on interpreting answers to questions. Human evaluation are also carried out to check the quality of distilled evidences. Experimental results show that automatic distilled evidences have human-like informativeness, conciseness and readability, which can enhance the interpretability of the answers to questions.