Xiaoqiang Wang

Mem-$π$: Adaptive Memory through Learning When and What to Generate

Xiaoqiang Wang

Chao Wang

Hadi Nekoei

Christopher Pal

Alexandre Lacoste

Spandana Gella

Bang Liu

Perouz Taslakian

We present Mem-…

2026-05-19

arXiv (prépublication)

doi.org

arxiv.org

System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts

Xiaoqiang Wang

Suyuchen Wang

Yun Zhu

Bang Liu

Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative Syst… (voir plus)em-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps uniformly, failing to distinguish critical deductions from auxiliary steps and resulting in suboptimal use of computational resources. In this paper, we propose System-1.5 Reasoning, an adaptive reasoning framework that dynamically allocates computation across reasoning steps through shortcut paths in latent space. Specifically, System-1.5 Reasoning introduces two types of dynamic shortcuts. The model depth shortcut (DS) adaptively reasons along the vertical depth by early exiting non-critical tokens through lightweight adapter branches, while allowing critical tokens to continue through deeper Transformer layers. The step shortcut (SS) reuses hidden states across the decoding steps to skip trivial steps and reason horizontally in latent space. Training System-1.5 Reasoning involves a two-stage self-distillation process: first distilling natural language CoT into latent-space continuous thought, and then distilling full-path System-2 latent reasoning into adaptive shortcut paths (System-1.5 Reasoning). Experiments on reasoning tasks demonstrate the superior performance of our method. For example, on GSM8K, System-1.5 Reasoning achieves reasoning performance comparable to traditional CoT fine-tuning methods while accelerating inference by over 20x and reducing token generation by 92.31% on average.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Xiaoqiang Wang

Bang Liu

Large language models (LLMs) and large multimodal models (LMMs) have shown great potential in automating complex tasks like web browsing and… (voir plus) gaming. However, their ability to generalize across diverse applications remains limited, hindering broader utility. To address this challenge, we present OSCAR: Operating System Control via state-Aware reasoning and Re-planning. OSCAR is a generalist agent designed to autonomously navigate and interact with various desktop and mobile applications through standardized controls, such as mouse and keyboard inputs, while processing screen images to fulfill user commands. OSCAR translates human instructions into executable Python code, enabling precise control over graphical user interfaces (GUIs). To enhance stability and adaptability, OSCAR operates as a state machine, equipped with error-handling mechanisms and dynamic task re-planning, allowing it to efficiently adjust to real-time feedback and exceptions. We demonstrate OSCAR’s effectiveness through extensive experiments on diverse benchmarks across desktop and mobile platforms, where it transforms complex workflows into simple natural language commands, significantly boosting user productivity. Our code will be open-source upon publication.

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

SkillQG: Learning to Generate Question for Reading Comprehension Assessment

Xiaoqiang Wang

Bang Liu

Siliang Tang

Lingfei Wu

2023-06-30

Findings of the Association for Computational Linguistics: ACL 2023 (publié)

doi.org

arxiv.org

QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance

Xiaoqiang Wang

Bang Liu

Siliang Tang

Lingfei Wu

Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input cont… (voir plus)ext of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (1) involves complicated reasoning with the context or (2) can be grounded by multiple evidences in the context.In this paper, we propose QRelScore, a context-aware Relevance evaluation metric for Question Generation.Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences, respectively.Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.

2022-11-30

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (publié)

doi.org

arxiv.org

Feeding What You Need by Understanding What You Learned

Xiaoqiang Wang

Bang Liu

Fangli Xu

Bo Long

Siliang Tang

Lingfei Wu

2021-12-31