Portrait of Bang Liu

Bang Liu

Associate Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Data Mining
Deep Learning
Generative Models
Learning on Graphs
Natural Language Processing

Biography

Bang Liu is an assistant professor in the Department of Computer Science and Operations Research (DIRO), and a core member of the Applied Research in Computational Linguistics Lab (RALI) at Université de Montréal. He is also an associate academic member of Mila – Quebec Artificial Intelligence Institute and a Canada CIFAR AI Chair.

Liu received his BEng from the University of Science and Technology of China in 2013, and his MSc and PhD degrees from the University of Alberta in 2015 and 2020, respectively. His research interests lie primarily in the areas of natural language processing, multimodal and embodied learning, theory and techniques for AGI (e.g., understanding and improving large language models), and AI for science (e.g., health, material science, XR).

Current Students

PhD - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Research Intern - McGill University
Master's Research - Université de Montréal
Master's Research - Université de Montréal

Publications

Co-Evolution of Policy and Internal Reward for Language Agents
Xinyu Wang
Hanwei Wu
Jingwei Song
Jiayi Zhang
Fanqi Kong
Tung Sum Thomas Kwok
Xiao-Wen Chang
Yuyu Luo
Chenglin Wu
Large language model (LLM) agents learn by interacting with environments, but long-horizon training remains fundamentally bottlenecked by sp… (see more)arse and delayed rewards. Existing methods typically address this challenge through post-hoc credit assignment or external reward models, which provide limited guidance at inference time and often separate reward improvement from policy improvement. We propose Self-Guide, a self-generated internal reward for language agents that supports both inference-time guidance and training-time supervision. Specifically, the agent uses Self-Guide as a short self-guidance signal to steer the next action during inference, and converts the same signal into step-level internal reward for denser policy optimization during training. This creates a co-evolving loop: better policy produces better guidance, and better guidance further improves policy as internal reward. Across three agent benchmarks, inference-time self-guidance already yields clear gains, while jointly evolving policy and internal reward with GRPO brings further improvements (8\%) over baselines trained solely with environment reward. Overall, our results suggest that language agents can improve not only by collecting more experience, but also by learning to generate and refine their own internal reward during acting and learning.
What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles
Mi Zhou
H. Zhang
Qi Sima
We investigate the capacity of Large Language Models (LLMs) for imaginative reasoning—the proactive construction, testing, and revision of… (see more) hypotheses in information-sparse environments. Existing benchmarks, often static or focused on social deduction, fail to capture the dynamic, exploratory nature of this reasoning process. To address this gap, we introduce a comprehensive research framework based on the classic "Turtle Soup" game, integrating a benchmark, an agent, and an evaluation protocol. We present TurtleSoup-Bench, the first large-scale, bilingual, interactive benchmark for imaginative reasoning, comprising 800 turtle soup stories sourced from both the Internet and expert authors. We also propose Mosaic-Agent, a novel agent designed to assess LLMs' performance in this setting. To evaluate reasoning quality, we develop a multi-dimensional protocol measuring logical consistency, detail completion, and conclusion alignment. Experiments with leading LLMs reveal clear capability limits, common failure patterns, and a significant performance gap compared to humans. Our work offers new insights into LLMs' imaginative reasoning and establishes a foundation for future research on exploratory agent behavior.
TRACE: Temporal Rule-Anchored Chain-of-Evidence on Knowledge Graphs for Interpretable Stock Movement Prediction
Luis Castejón Lozano
Miguel Conner
Juan Abia
Luis Gallego-Ledesma
Joshua Fellowes
Gerard Conangla Planes
Adam Elwood
We present a Temporal Rule-Anchored Chain-of-Evidence (TRACE) on knowledge graphs for interpretable stock movement prediction that unifies s… (see more)ymbolic relational priors, dynamic graph exploration, and LLM-guided decision making in a single end-to-end pipeline. The approach performs rule-guided multi-hop exploration restricted to admissible relation sequences, grounds candidate reasoning chains in contemporaneous news, and aggregates fully grounded evidence into auditable \texttt{UP}/\texttt{DOWN} verdicts with human-readable paths connecting text and structure. On an S\&P~500 benchmark, the method achieves 55.1\% accuracy, 55.7\% precision, 71.5\% recall, and 60.8\% F1, surpassing strong baselines and improving recall and F1 over the best graph baseline under identical evaluation. The gains stem from (i) rule-guided exploration that focuses search on economically meaningful motifs rather than arbitrary walks, and (ii) text-grounded consolidation that selectively aggregates high-confidence, fully grounded hypotheses instead of uniformly pooling weak signals. Together, these choices yield higher sensitivity without sacrificing selectivity, delivering predictive lift with faithful, auditably interpretable explanations.
Latent Action Reparameterization for Efficient Agent Inference
Qingwen Zeng
Zerui Xu
Zijie Guo
Yu Sun
Siru Ouyang
Jiri Gesi
Fang Wu
Jiayi Zhang
Chenglin Wu
Xiangru Tang
Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons … (see more)and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.
Titanium nanotube arrays promote the activity of anastomotic healing-related cells by increasing fibronectin adsorption and activating the RGD–integrin pathway
Pengyu Chen
Yijia Li
Yahui Hu
Weihua Fu
The smooth titanium staples of stapling devices cannot reduce the incidence of gastrointestinal anastomotic leakage due to their bioinert na… (see more)ture and lack of active wound-healing promotion capability. This study aims to investigate whether titanium nanotube arrays (TNTs) can enhance the activity of cells involved in gastrointestinal anastomotic healing and further explore the potential mechanisms. TNTs were fabricated on pure titanium sheets via anodic oxidation, and characterized using scanning electron microscopy, roughness analysis, contact angle measurement, and x-ray photoelectron spectroscopy. Cell adhesion, proliferation, spreading, collagen secretion, and integrin expression were evaluated using methods such as CCK-8, immunofluorescence, qPCR, enzyme-linked immunosorbent assay (ELISA), and Western blot. Fibronectin (FN) adsorption and Arg-Gly-Asp tripeptide sequence (RGD domain) exposure were detected via bicinchoninic acid assay, fluorescent staining, and ELISA. The role of the RGD-integrin pathway was further investigated by supplementing serum-reduced medium with exogenous FN and using RGD-specific antagonists. The results showed that TNTs increased the roughness, hydrophilicity, and surface free energy of titanium surfaces. Compared with smooth pure titanium, TNTs promoted the adhesion, proliferation, spreading, and integrin expression of gastric mucosal epithelial cells and fibroblasts, while enhancing the collagen secretion capacity of fibroblasts. Moreover, TNTs adsorbed more FN and exposed more RGD domains, thereby upregulating integrin α5β1 expression. The RGD antagonist could reverse these enhanced cellular responses, confirming the pivotal role of the FN–RGD–integrin pathway. The conclusion indicates that TNTs enhance the adhesion, proliferation, and functional activity of gastrointestinal anastomosis-related cells by promoting FN adsorption and activating the RGD–integrin pathway, which demonstrates that TNT-modified titanium materials hold significant potential for developing bioactive anastomotic devices and promoting tissue healing.
InfoPO: Information-Driven Policy Optimization for User-Centric Agents
Fanqi Kong
Jiayi Zhang
Mingyi Deng
Chenglin Wu
Yuyu Luo
Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downst… (see more)ream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion to identify information importance while maintaining task-oriented goal direction. Across diverse tasks, including intent clarification, collaborative coding, and tool-augmented decision making, InfoPO consistently outperforms prompting and multi-turn RL baselines. It also demonstrates robustness under user simulator shifts and generalizes effectively to environment-interactive tasks. Overall, InfoPO provides a principled and scalable mechanism for optimizing complex agent-user collaboration. Code is available at https://github.com/kfq20/InfoPO.
Sparsity-Aware Evolution for Model Merging
Yanjian Zhang
Nadi Tomeh
Guillaume Wisniewski
We propose a sparsity-aware evolutionary (SAE) framework for model merging that involves iterative pruning-merging cycles to act as a novel … (see more)mutation operator. We incorporate the sparsity constraints into the score function, which steers the evolutionary process to favor more sparse models, in addition to other conventional performance scores. Interestingly, the by-product of \textit{competition} for sparsity introduces an extra local \textit{attraction} and interplay into the evolutionary process: if one competitor has more zero elements, the other competitor's non-zero elements will occupy those positions, even though the less sparse competitor loses to the more sparse competitor in other positions. The proposed pipeline is evaluated on a variety of large-scale LLM benchmarks. Experiments demonstrate that our approach can improve model merging reliability across multiple benchmarks, and is easy to incorporate due to its simplicity and being orthogonal to most existing approaches.
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Jianhao Ruan
Zhihao Xu
Yiran Peng
Fashen Ren
Zhaoyang Yu
Xinbing Liang
Jinyu Xiang
Yongru Chen
Chenglin Wu
Yuyu Luo
Jiayi Zhang
Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has drive… (see more)n the rise of a sub-agent-as-tools paradigm for multi-turn task solving. However, existing designs still lack a dynamic abstraction view of sub-agents, thereby hurting adaptability. We address this challenge with a unified, framework-agnostic agent abstraction that models any agent as a tuple Instruction, Context, Tools, Model. This tuple acts as a compositional recipe for capabilities, enabling the system to spawn specialized executors for each task on demand. Building on this abstraction, we introduce an agentic system AOrchestra, where the central orchestrator concretizes the tuple at each step: it curates task-relevant context, selects tools and models, and delegates execution via on-the-fly automatic agent creation. Such designs enable reducing human engineering efforts, and remain framework-agnostic with plug-and-play support for diverse agents as task executors. It also enables a controllable performance-cost trade-off, allowing the system to approach Pareto-efficient. Across three challenging benchmarks (GAIA, SWE-Bench, Terminal-Bench), AOrchestra achieves 16.28% relative improvement against the strongest baseline when paired with Gemini-3-Flash. The code is available at: https://github.com/FoundationAgents/AOrchestra
Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts
Wei Zhou
Hong Huang
Ruize Shi
Heterogeneous Graph Neural Networks(HGNNs) have advanced mainly through better encoders, yet their decoding/projection stage still relies on… (see more) a single shared linear head, assuming it can map rich node embeddings to labels. We call this the Linear Projection Bottleneck: in heterogeneous graphs, contextual diversity and long-tail shifts make a global head miss fine semantics, overfit hub nodes, and underserve tail nodes. While Mixture-of-Experts(MoE) could help, naively applying it clashes with structural imbalance and risks expert collapse. We propose a Heterogeneous-aware Orthogonal Prototype Experts framework named HOPE, a plug-and-play replacement for the standard prediction head. HOPE uses learnable prototype-based routing to assign instances to experts by similarity, letting expert usage follow the natural long-tail distribution, and adds expert orthogonalization to encourage diversity and prevent collapse. Experiments on four real datasets show consistent gains across SOTA HGNN backbones with minimal overhead.
Evolving Programmatic Skill Networks
Xingdi Yuan
We study continual skill acquisition in open-ended embodied environments where an agent must construct, refine, and reuse an expanding libra… (see more)ry of executable skills. We introduce the Programmatic Skill Network (PSN), a framework in which skills are executable symbolic programs forming a compositional network that evolves through experience. PSN defines three core mechanisms instantiated via large language models: (1)REFLECT for structured fault localization over skill compositions, (2) progressive optimization with maturity-aware update gating that stabilizes reliable skills while maintaining plasticity for uncertain ones, and (3) canonical structural refactoring under rollback validation that maintains network compactness. We further show that PSN's learning dynamics exhibit structural parallels to neural network training. Experiments on MineDojo and Crafter demonstrate robust skill reuse, rapid adaptation, and strong generalization across open-ended task distributions.\footnote{We plan to open-source the code.
GraphOmni: A Comprehensive and Extensible Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Hao Xu
Xiangru Jian
Xinjian Zhao
Wei Pang
Chao Zhang
Qixin Zhang
Zhengyuan Dong
Joao Monteiro
Qiuzhuang Sun
Tianshu Yu
This paper introduces GraphOmni, a comprehensive benchmark designed to evaluate the reasoning capabilities of LLMs on graph-theoretic tasks … (see more)articulated in natural language. GraphOmni spans diverse graph types, serialization formats, and prompting schemes, substantially extending upon prior efforts in both scope and depth. Through systematic evaluation, we uncover critical interactions among these dimensions, revealing their decisive impact on model performance. Our experiments show that state-of-the-art closed-source models such as Claude-3.5 and o4-mini consistently lead overall, yet still leave considerable headroom, while open-source models display pronounced sensitivity to various design choices. Beyond the standard scope, larger graphs, real-world graphs, and additional NP-hard tasks are further discussed. We further analyze efficiency via output token usage, highlighting cost–accuracy trade-offs, and introduce a reinforcement learning-based optimizer that adaptively selects factor combinations, reducing evaluation cost by 75\% while retaining strong accuracy. This flexible and extensible benchmark not only deepens understanding of LLM performance on structured graph reasoning but also establishes a robust foundation for advancing model design and evaluation. The code and datasets are available at https://anonymous.4open.science/r/ID-14092.
Accelerated Inorganic Materials Design with Generative Al Agents
Teruyasu Mizoguchi
Designing inorganic crystalline materials with tailored properties is critical to technological innovation, yet current generative computati… (see more)onal methods often struggle to efficiently explore desired targets with sufficient interpretability. Here, we present MatAgent, a generative approach for inorganic materials discovery that harnesses the powerful reasoning capabilities of large language models (LLMs). By combining a diffusion-based generative model for crystal structure estimation with a predictive model for property evaluation, MatAgent uses iterative, feedback-driven guidance to steer material exploration precisely toward user-defined targets. Integrated with external cognitive tools-including short-term memory, long-term memory, the periodic table, and a comprehensive materials knowledge base-MatAgent emulates human expert reasoning to vastly expand the accessible compositional space. Our results demonstrate that MatAgent robustly directs exploration toward desired properties while consistently achieving high compositional validity, uniqueness, and material novelty. This framework thus provides a highly interpretable, practical, and versatile AI-driven solution to accelerate the discovery and design of next-generation inorganic materials.