Xiaoqiang Wang

OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Large language models (LLMs) and large multimodal models (LMMs) have shown great potential in automating complex tasks like web browsing and… (see more) gaming. However, their ability to generalize across diverse applications remains limited, hindering broader utility. To address this challenge, we present OSCAR: Operating System Control via state-Aware reasoning and Re-planning. OSCAR is a generalist agent designed to autonomously navigate and interact with various desktop and mobile applications through standardized controls, such as mouse and keyboard inputs, while processing screen images to fulfill user commands. OSCAR translates human instructions into executable Python code, enabling precise control over graphical user interfaces (GUIs). To enhance stability and adaptability, OSCAR operates as a state machine, equipped with error-handling mechanisms and dynamic task re-planning, allowing it to efficiently adjust to real-time feedback and exceptions. We demonstrate OSCAR’s effectiveness through extensive experiments on diverse benchmarks across desktop and mobile platforms, where it transforms complex workflows into simple natural language commands, significantly boosting user productivity. Our code will be open-source upon publication.

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

SkillQG: Learning to Generate Question for Reading Comprehension Assessment

Xiaoqiang Wang

Bang Liu

Siliang Tang

Lingfei Wu

2023-07-01

Findings of the Association for Computational Linguistics: ACL 2023 (published)

doi.org

arxiv.org

QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance

Xiaoqiang Wang

Bang Liu

Siliang Tang

Lingfei Wu

Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input cont… (see more)ext of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (1) involves complicated reasoning with the context or (2) can be grounded by multiple evidences in the context.In this paper, we propose QRelScore, a context-aware Relevance evaluation metric for Question Generation.Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences, respectively.Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.

2022-12-01

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

arxiv.org

Feeding What You Need by Understanding What You Learned

Xiaoqiang Wang

Bang Liu

Fangli Xu

Bo Long

Siliang Tang

Lingfei Wu

2022-01-01