Xujie Si

Doctorat - McGill

Ray Luo

Doctorat - McGill

Co-superviseur⋅e :

Publications

DreamProver: Evolving Transferable Lemma Libraries via a Wake-Sleep Theorem-Proving Agent

Youyuan Zhang

Jialiang Sun

Hangrui Bi

Wenjie Ma

Zhaoyu Li

We introduce DreamProver, an agentic framework that leverages a "wake-sleep" program induction paradigm to discover reusable lemmas for form… (voir plus)al theorem proving. Existing approaches either rely on fixed lemma libraries, which limit adaptability, or synthesize highly specific intermediate lemmas tailored to individual theorems, thereby lacking generality. DreamProver addresses this gap through an iterative two-stage process. In the wake stage, DreamProver attempts to prove theorems from a training set using the current lemma library while proposing new candidate lemmas. In the "sleep" stage, it abstracts, refines, and consolidates these candidates to compress and optimize the library. Through this alternating cycle, DreamProver progressively evolves a compact set of high-level, transferable lemmas that can be effectively used to prove unseen theorems in related domains. Experimental results demonstrate that DreamProver substantially improves proof success rates across a diverse set of mathematical benchmarks, while also producing more concise proofs and reducing computational cost.

2026-04-28

arXiv (prépublication)

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

Tara Saba

Anne Ouyang

Fan Long

High-performance GPU kernels are critical to modern machine learning systems, yet developing efficient implementations remains a challenging… (voir plus), expert-driven process due to the tight coupling between algorithmic structure, memory hierarchy usage, and hardware-specific optimizations. Recent work has explored using large language models (LLMs) to generate GPU kernels automatically, but generated implementations often struggle to maintain correctness and achieve competitive performance across iterative refinements. We present CuTeGen, an agentic framework for automated generation and optimization of GPU kernels that treats kernel development as a structured generate--test--refine workflow. Unlike approaches that rely on one-shot generation or large-scale search over candidate implementations, CuTeGen focuses on progressive refinement of a single evolving kernel through execution-based validation, structured debugging, and staged optimization. A key design choice is to generate kernels using the CuTe abstraction layer, which exposes performance-critical structures such as tiling and data movement while providing a more stable representation for iterative modification. To guide performance improvement, CuTeGen incorporates workload-aware optimization prompts and delayed integration of profiling feedback. Experimental results on matrix multiplication and activation workloads demonstrate that the framework produces functionally correct kernels and achieves competitive performance relative to optimized library implementations.

2026-03-31

arXiv (prépublication)

Beyond Message Passing: A Symbolic Alternative for Expressive and Interpretable Graph Learning

Li Zhang

Haolin Ye

Ziyu Zhao

Yuhe Jiang

Tara Saba

Xinyu Wang

Graph Neural Networks (GNNs) have become essential in high-stakes domains such as drug discovery, yet their black-box nature remains a signi… (voir plus)ficant barrier to trustworthiness. While self-explainable GNNs attempt to bridge this gap, they often rely on standard message-passing backbones that inherit fundamental limitations, including the 1-Weisfeiler-Lehman (1-WL) expressivity barrier and a lack of fine-grained interpretability. To address these challenges, we propose SymGraph, a symbolic framework designed to transcend these constraints. By replacing continuous message passing with discrete structural hashing and topological role-based aggregation, our architecture theoretically surpasses the 1-WL barrier, achieving superior expressiveness without the overhead of differentiable optimization. Extensive empirical evaluations demonstrate that SymGraph achieves state-of-the-art performance, outperforming existing self-explainable GNNs. Notably, SymGraph delivers 10x to 100x speedups in training time using only CPU execution. Furthermore, SymGraph generates rules with superior semantic granularity compared to existing rule-based methods, offering great potential for scientific discovery and explainable AI.

2026-02-17

arXiv (prépublication)

Neural Proposals, Symbolic Guarantees: Neuro-Symbolic Graph Generation with Hard Constraints

Li Zhang

Mark Zhang

Haolin Ye

Ziyu Zhao

We challenge black-box purely deep neural approaches for molecules and graph generation, which are limited in controllability and lack forma… (voir plus)l guarantees. We introduce Neuro-Symbolic Graph Generative Modeling (NSGGM), a neurosymbolic framework that reapproaches molecule generation as a scaffold and interaction learning task with symbolic assembly. An autoregressive neural model proposes scaffolds and refines interaction signals, and a CPU-efficient SMT solver constructs full graphs while enforcing chemical validity, structural rules, and user-specific constraints, yielding molecules that are correct by construction and interpretable control that pure neural methods cannot provide. NSGGM delivers strong performance on both unconstrained generation and constrained generation tasks, demonstrating that neuro-symbolic modeling can match state-of-the-art generative performance while offering explicit controllability and guarantees. To evaluate more nuanced controllability, we also introduce a Logical-Constraint Molecular Benchmark, designed to test strict hard-rule satisfaction in workflows that require explicit, interpretable specifications together with verifiable compliance.

2026-02-17

Open MIND (prépublication)

LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks

Ziyu Zhao

Zhaoyue Wang

Haolin Ye

Yuhe Jiang

Existing rule-based explanations for Graph Neural Networks (GNNs) provide global interpretability but often optimize and assess fidelity in … (voir plus)an intermediate, uninterpretable concept space, overlooking the grounding quality of the final subgraph explanations for end users. This gap yields explanations that may appear faithful yet be unreliable in practice. To this end, we propose LogicXGNN, a post hoc framework that constructs logical rules over reliable predicates explicitly designed to capture the GNN's message-passing structure, thereby ensuring effective grounding. We further introduce data-grounded fidelity (

2025-12-31

International Conference on Learning Representations (Accept (Poster))

Learning Minimal Neural Specifications

Zhaoyue Wang

Haolin Ye

2025-07-07

Proceedings of the International Conference on Neuro-symbolic Systems (publié)

proceedings.mlr.press

Understanding Behavioral Metric Learning: A Large-Scale Study on Distracting Reinforcement Learning Environments

Ziyan "Ray" Luo

A key approach to state abstraction is approximating behavioral metrics (notably, bisimulation metrics) in the observation space and embeddi… (voir plus)ng these learned distances in the representation space. While promising for robustness to task-irrelevant noise, as shown in prior work, accurately estimating these metrics remains challenging, requiring various design choices that create gaps between theory and practice. Prior evaluations focus mainly on final returns, leaving the quality of learned metrics and the source of performance gains unclear. To systematically assess how metric learning works in deep reinforcement learning (RL), we evaluate five recent approaches, unified conceptually as isometric embeddings with varying design choices. We benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 370 task configurations with diverse noise settings. Beyond final returns, we introduce the evaluation of a denoising factor to quantify the encoder's ability to filter distractions. To further isolate the effect of metric learning, we propose and evaluate an isolated metric estimation setting, in which the encoder is influenced solely by the metric loss. Finally, we release an open-source, modular codebase to improve reproducibility and support future research on metric learning in deep RL.

2025-05-08

rl-conference.cc/RLC/2025/Conference (accepté)

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

Honghua Dong

Jiacheng Yang

Xun Deng

Yuhe Jiang

Gennady Pekhimenko

Fan Long

2025-04-30

ICML.cc/2025/Conference (poster)

proceedings.mlr.press

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning

Zenan Li

Zhaoyu Li

Wen Tang

Xian Zhang

Yuan Yao

Fan Yang

Kaiyu Yang

Xiaoxing Ma

Large language models (LLMs) can prove mathematical theorems formally by generating proof steps (\textit{a.k.a.} tactics) within a proof sys… (voir plus)tem. However, the space of possible tactics is vast and complex, while the available training data for formal proofs is limited, posing a significant challenge to LLM-based tactic generation. To address this, we introduce a neuro-symbolic tactic generator that synergizes the mathematical intuition learned by LLMs with domain-specific insights encoded by symbolic methods. The key aspect of this integration is identifying which parts of mathematical reasoning are best suited to LLMs and which to symbolic methods. While the high-level idea of neuro-symbolic integration is broadly applicable to various mathematical problems, in this paper, we focus specifically on Olympiad inequalities (Figure~1). We analyze how humans solve these problems and distill the techniques into two types of tactics: (1) scaling, handled by symbolic methods, and (2) rewriting, handled by LLMs. In addition, we combine symbolic tools with LLMs to prune and rank the proof goals for efficient proof search. We evaluate our framework on 161 challenging inequalities from multiple mathematics competitions, achieving state-of-the-art performance and significantly outperforming existing LLM and symbolic approaches without requiring additional training data.

2025-01-21

ICLR.cc/2025/Conference (poster)

Library Learning Doesn’t: The Curious Case of the Single-Use “Library”

Ian Berlot-Attwell

Frank Rudzicz

Advances in Large Language Models (LLMs) have spurred a wave of LLM library learning systems for mathematical reasoning. These systems aim … (voir plus)to learn a reusable library of *tools*, such as formal Isabelle lemmas or Python programs that are tailored to a family of tasks. Many of these systems are inspired by the human structuring of knowledge into reusable and extendable concepts, but do current methods actually learn reusable libraries of tools? We study two library learning systems for mathematics which both reported increased accuracy: LEGO-Prover and TroVE. We find that function reuse is extremely infrequent on miniF2F and MATH. Our followup ablation experiments suggest that, rather than reuse, self-correction and self-consistency are the primary drivers of the observed performance gains.

2024-10-08

NeurIPS.cc/2024/Workshop/MATH-AI (accepté)

LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation

Bowen Li

Zhaoyu Li

Qiwei Du

Jinqi Luo

Wenshan Wang

Yaqi Xie

Simon Stepputtis

Chen Wang

Katia P. Sycara

Pradeep Kumar Ravikumar

Alexander G. Gray

Sebastian Scherer

Recent years have witnessed the rapid development of Neuro-Symbolic (NeSy) AI systems, which integrate symbolic reasoning into deep neural n… (voir plus)etworks. However, most of the existing benchmarks for NeSy AI fail to provide long-horizon reasoning tasks with complex multi-agent interactions. Furthermore, they are usually constrained by fixed and simplistic logical rules over limited entities, making them far from real-world complexities. To address these crucial gaps, we introduce LogiCity, the first simulator based on customizable first-order logic (FOL) for an urban-like environment with multiple dynamic agents. LogiCity models diverse urban elements using semantic and spatial concepts, such as

2024-09-25

Datasets and Benchmarks Track @ Neural Information Processing Systems (poster)

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

Hao Tang

Keya Hu

Jin Peng Zhou

Si Cheng Zhong

Wei-Long Zheng

Kevin Ellis

2024-09-24

NeurIPS.cc/2024/Conference (poster)