Portrait of Bang Liu

Bang Liu

Associate Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Data Mining
Deep Learning
Generative Models
Learning on Graphs
Natural Language Processing

Biography

Bang Liu is an assistant professor in the Department of Computer Science and Operations Research (DIRO), and a core member of the Applied Research in Computational Linguistics Lab (RALI) at Université de Montréal. He is also an associate academic member of Mila – Quebec Artificial Intelligence Institute and a Canada CIFAR AI Chair.

Liu received his BEng from the University of Science and Technology of China in 2013, and his MSc and PhD degrees from the University of Alberta in 2015 and 2020, respectively. His research interests lie primarily in the areas of natural language processing, multimodal and embodied learning, theory and techniques for AGI (e.g., understanding and improving large language models), and AI for science (e.g., health, material science, XR).

Current Students

PhD - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Research Intern - McGill University
Master's Research - Université de Montréal
Master's Research - Université de Montréal

Publications

ReCode: Unify Plan and Action for Universal Granularity Control
Zhaoyang Yu
Jiayi Zhang
Huixue Su
Yufan Zhao
Yifan Wu
Mingyi Deng
Jinyu Xiang
Yizhang Lin
Fanqi Kong
Lingxiao Tang
Yuyu Luo
Chenglin Wu
Real-world tasks require decisions at varying granularities, and humans excel at this by leveraging a unified cognitive representation where… (see more) planning is fundamentally understood as a high-level form of action. However, current Large Language Model (LLM)-based agents lack this crucial capability to operate fluidly across decision granularities. This limitation stems from existing paradigms that enforce a rigid separation between high-level planning and low-level action, which impairs dynamic adaptability and limits generalization. We propose **ReCode** (**Re**cursive **Code** Generation), a novel paradigm that addresses this limitation by unifying planning and action within a single code representation. In this representation, ReCode treats high-level plans as abstract placeholder functions, which the agent then recursively decomposes into finer-grained sub-functions until reaching primitive actions. This recursive approach dissolves the rigid boundary between plan and action, enabling the agent to dynamically control its decision granularity. Furthermore, the recursive structure inherently generates rich, multi-granularity training data, enabling models to learn hierarchical decision-making processes. Extensive experiments show ReCode significantly surpasses advanced baselines in inference performance and demonstrates exceptional data efficiency in training, validating our core insight that unifying planning and action through recursive code generation is a powerful and effective approach to achieving universal granularity control.
Mem-$π$: Adaptive Memory through Learning When and What to Generate
Chao Wang
Christopher Pal
Alexandre Lacoste
We present Mem-…
Scalable Environments Drive Generalizable Agents
Jiayi Zhang
Fanqi Kong
Guibin Zhang
Maojia Song
Zhaoyang Yu
Jianhao Ruan
Jinyu Xiang
Chenglin Wu
Yuyu Luo
Generalizable agents should adapt to diverse tasks and unseen environments beyond their training distribution. This position paper argues th… (see more)at such generalization requires environment scaling: expanding the distribution of executable rule-sets that agents interact with, rather than only increasing trajectories or tasks within fixed benchmarks. Current scaling practices largely focus on collecting more experience or broader task sets under fixed interaction rules, leaving agents brittle when underlying interfaces, dynamics, observations, or feedback signals change. The core challenge is therefore a world-level distribution shift: agents need systematic exposure to environments with meaningfully different executable rule-sets. To clarify this challenge, we propose a unified taxonomy that separates trajectory scaling, task scaling, and environment scaling by their primary deliverables and by what changes in the executable rule-set. Building on this taxonomy, we synthesize construction paradigms for scalable environments, contrasting programmatic generators that prioritize controllability and verifiability with generative world models that offer broader coverage and open-endedness. We further outline how environment scaling can be coupled with stateful learning mechanisms, emphasizing learned update rules for cross-environment adaptation. We conclude by discussing alternative perspectives and argue that scalable environments provide the essential substrate for measurable and controllable progress toward robust general agents.
EIAN: Explicit Interaction-aware Attention Network for Interpretable Event Modeling
Jiping Zhang
Hua Zhu
Hong Huang
Yi Zhou
Kehan Yin
Event sequences are integral to domains such as e-commerce, social networks, and healthcare. Traditional point process models, like Poisson … (see more)and Hawkes processes, are foundational but limited by rigid parametric assumptions, constraining their flexibility in complex real-world scenarios. Neural point processes offer a more adaptable alternative, but typically perform implicit sequence modeling, which does not fully exploit critical event interaction patterns and limits transparency. To address these challenges, we introduce the Explicit Interaction-aware Attention Network (EIAN), a novel model that enhances event modeling by explicitly capturing both intra-type and cross-type event interactions. Specifically, EIAN employs two key components: an intra-type temporal encoder that preserves the unique temporal dynamics within each event type, and a cross-type interaction decoder that highlights interactions across event types. Furthermore, two temporal encoding mechanisms are integrated into the interaction decoder to handle irregular inter-event intervals in diverse temporal scenarios. Extensive experiments show that EIAN consistently outperforms existing models in predictive performance and provides deeper insights into event interaction patterns, advancing both flexibility and interpretability. Our code is available at https://github.com/CGCL-codes/EIAN.git.
Co-Evolution of Policy and Internal Reward for Language Agents
Xinyu Wang
Hanwei Wu
Jingwei Song
Jiayi Zhang
Fanqi Kong
Tung Sum Thomas Kwok
Xiao-Wen Chang
Yuyu Luo
Chenglin Wu
Large language model (LLM) agents learn by interacting with environments, but long-horizon training remains fundamentally bottlenecked by sp… (see more)arse and delayed rewards. Existing methods typically address this challenge through post-hoc credit assignment or external reward models, which provide limited guidance at inference time and often separate reward improvement from policy improvement. We propose Self-Guide, a self-generated internal reward for language agents that supports both inference-time guidance and training-time supervision. Specifically, the agent uses Self-Guide as a short self-guidance signal to steer the next action during inference, and converts the same signal into step-level internal reward for denser policy optimization during training. This creates a co-evolving loop: better policy produces better guidance, and better guidance further improves policy as internal reward. Across three agent benchmarks, inference-time self-guidance already yields clear gains, while jointly evolving policy and internal reward with GRPO brings further improvements (8\%) over baselines trained solely with environment reward. Overall, our results suggest that language agents can improve not only by collecting more experience, but also by learning to generate and refine their own internal reward during acting and learning.
What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles
Mi Zhou
H. Zhang
Qi Sima
We investigate the capacity of Large Language Models (LLMs) for imaginative reasoning—the proactive construction, testing, and revision of… (see more) hypotheses in information-sparse environments. Existing benchmarks, often static or focused on social deduction, fail to capture the dynamic, exploratory nature of this reasoning process. To address this gap, we introduce a comprehensive research framework based on the classic "Turtle Soup" game, integrating a benchmark, an agent, and an evaluation protocol. We present TurtleSoup-Bench, the first large-scale, bilingual, interactive benchmark for imaginative reasoning, comprising 800 turtle soup stories sourced from both the Internet and expert authors. We also propose Mosaic-Agent, a novel agent designed to assess LLMs' performance in this setting. To evaluate reasoning quality, we develop a multi-dimensional protocol measuring logical consistency, detail completion, and conclusion alignment. Experiments with leading LLMs reveal clear capability limits, common failure patterns, and a significant performance gap compared to humans. Our work offers new insights into LLMs' imaginative reasoning and establishes a foundation for future research on exploratory agent behavior.
TRACE: Temporal Rule-Anchored Chain-of-Evidence on Knowledge Graphs for Interpretable Stock Movement Prediction
Luis Castejón Lozano
Miguel Conner
Juan Abia
Luis Gallego-Ledesma
Joshua Fellowes
Gerard Conangla Planes
Adam Elwood
We present a Temporal Rule-Anchored Chain-of-Evidence (TRACE) on knowledge graphs for interpretable stock movement prediction that unifies s… (see more)ymbolic relational priors, dynamic graph exploration, and LLM-guided decision making in a single end-to-end pipeline. The approach performs rule-guided multi-hop exploration restricted to admissible relation sequences, grounds candidate reasoning chains in contemporaneous news, and aggregates fully grounded evidence into auditable \texttt{UP}/\texttt{DOWN} verdicts with human-readable paths connecting text and structure. On an S\&P~500 benchmark, the method achieves 55.1\% accuracy, 55.7\% precision, 71.5\% recall, and 60.8\% F1, surpassing strong baselines and improving recall and F1 over the best graph baseline under identical evaluation. The gains stem from (i) rule-guided exploration that focuses search on economically meaningful motifs rather than arbitrary walks, and (ii) text-grounded consolidation that selectively aggregates high-confidence, fully grounded hypotheses instead of uniformly pooling weak signals. Together, these choices yield higher sensitivity without sacrificing selectivity, delivering predictive lift with faithful, auditably interpretable explanations.
Latent Action Reparameterization for Efficient Agent Inference
Qingwen Zeng
Zerui Xu
Zijie Guo
Yu Sun
Siru Ouyang
Jiri Gesi
Fang Wu
Jiayi Zhang
Chenglin Wu
Xiangru Tang
Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons … (see more)and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.
Titanium nanotube arrays promote the activity of anastomotic healing-related cells by increasing fibronectin adsorption and activating the RGD–integrin pathway
Pengyu Chen
Yijia Li
Yahui Hu
Weihua Fu
The smooth titanium staples of stapling devices cannot reduce the incidence of gastrointestinal anastomotic leakage due to their bioinert na… (see more)ture and lack of active wound-healing promotion capability. This study aims to investigate whether titanium nanotube arrays (TNTs) can enhance the activity of cells involved in gastrointestinal anastomotic healing and further explore the potential mechanisms. TNTs were fabricated on pure titanium sheets via anodic oxidation, and characterized using scanning electron microscopy, roughness analysis, contact angle measurement, and x-ray photoelectron spectroscopy. Cell adhesion, proliferation, spreading, collagen secretion, and integrin expression were evaluated using methods such as CCK-8, immunofluorescence, qPCR, enzyme-linked immunosorbent assay (ELISA), and Western blot. Fibronectin (FN) adsorption and Arg-Gly-Asp tripeptide sequence (RGD domain) exposure were detected via bicinchoninic acid assay, fluorescent staining, and ELISA. The role of the RGD-integrin pathway was further investigated by supplementing serum-reduced medium with exogenous FN and using RGD-specific antagonists. The results showed that TNTs increased the roughness, hydrophilicity, and surface free energy of titanium surfaces. Compared with smooth pure titanium, TNTs promoted the adhesion, proliferation, spreading, and integrin expression of gastric mucosal epithelial cells and fibroblasts, while enhancing the collagen secretion capacity of fibroblasts. Moreover, TNTs adsorbed more FN and exposed more RGD domains, thereby upregulating integrin α5β1 expression. The RGD antagonist could reverse these enhanced cellular responses, confirming the pivotal role of the FN–RGD–integrin pathway. The conclusion indicates that TNTs enhance the adhesion, proliferation, and functional activity of gastrointestinal anastomosis-related cells by promoting FN adsorption and activating the RGD–integrin pathway, which demonstrates that TNT-modified titanium materials hold significant potential for developing bioactive anastomotic devices and promoting tissue healing.
InfoPO: Information-Driven Policy Optimization for User-Centric Agents
Fanqi Kong
Jiayi Zhang
Mingyi Deng
Chenglin Wu
Yuyu Luo
Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downst… (see more)ream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion to identify information importance while maintaining task-oriented goal direction. Across diverse tasks, including intent clarification, collaborative coding, and tool-augmented decision making, InfoPO consistently outperforms prompting and multi-turn RL baselines. It also demonstrates robustness under user simulator shifts and generalizes effectively to environment-interactive tasks. Overall, InfoPO provides a principled and scalable mechanism for optimizing complex agent-user collaboration. Code is available at https://github.com/kfq20/InfoPO.
Sparsity-Aware Evolution for Model Merging
Yanjian Zhang
Nadi Tomeh
Guillaume Wisniewski
We propose a sparsity-aware evolutionary (SAE) framework for model merging that involves iterative pruning-merging cycles to act as a novel … (see more)mutation operator. We incorporate the sparsity constraints into the score function, which steers the evolutionary process to favor more sparse models, in addition to other conventional performance scores. Interestingly, the by-product of \textit{competition} for sparsity introduces an extra local \textit{attraction} and interplay into the evolutionary process: if one competitor has more zero elements, the other competitor's non-zero elements will occupy those positions, even though the less sparse competitor loses to the more sparse competitor in other positions. The proposed pipeline is evaluated on a variety of large-scale LLM benchmarks. Experiments demonstrate that our approach can improve model merging reliability across multiple benchmarks, and is easy to incorporate due to its simplicity and being orthogonal to most existing approaches.
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Jianhao Ruan
Zhihao Xu
Yiran Peng
Fashen Ren
Zhaoyang Yu
Xinbing Liang
Jinyu Xiang
Yongru Chen
Chenglin Wu
Yuyu Luo
Jiayi Zhang
Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has drive… (see more)n the rise of a sub-agent-as-tools paradigm for multi-turn task solving. However, existing designs still lack a dynamic abstraction view of sub-agents, thereby hurting adaptability. We address this challenge with a unified, framework-agnostic agent abstraction that models any agent as a tuple Instruction, Context, Tools, Model. This tuple acts as a compositional recipe for capabilities, enabling the system to spawn specialized executors for each task on demand. Building on this abstraction, we introduce an agentic system AOrchestra, where the central orchestrator concretizes the tuple at each step: it curates task-relevant context, selects tools and models, and delegates execution via on-the-fly automatic agent creation. Such designs enable reducing human engineering efforts, and remain framework-agnostic with plug-and-play support for diverse agents as task executors. It also enables a controllable performance-cost trade-off, allowing the system to approach Pareto-efficient. Across three challenging benchmarks (GAIA, SWE-Bench, Terminal-Bench), AOrchestra achieves 16.28% relative improvement against the strongest baseline when paired with Gemini-3-Flash. The code is available at: https://github.com/FoundationAgents/AOrchestra