Portrait de Xujie Si

Xujie Si

Membre affilié
Chaire en IA Canada-CIFAR
Professeur adjoint, University of Toronto, Département d'informatique
Sujets de recherche
Apprentissage de la programmation
Apprentissage de représentations
Raisonnement

Biographie

Xujie Si est professeur adjoint au Département d'informatique de l'Université de Toronto. Il est également membre affilié de la faculté de l'Institut Vector et membre affilié de Mila – Institut québécois d’intelligence artificielle, où il est titulaire d'une chaire en IA Canada-CIFAR. Il a obtenu un doctorat de l'Université de Pennsylvanie en 2020. Il est également détenteur d’une maîtrise de l'Université Vanderbilt et d’une licence (avec mention) de l'Université de Nankai. Ses recherches se situent à l'intersection des langages de programmation et de l'intelligence artificielle. Il s'intéresse au développement de techniques basées sur l'apprentissage pour aider les programmeurs à construire plus facilement de meilleurs logiciels, à l'intégration de la programmation logique à des systèmes d'apprentissage différentiables afin de permettre un raisonnement interprétable et évolutif, et à l'exploitation des abstractions de programmation pour un apprentissage fiable et efficace en matière de données. Ses travaux ont été récompensés par le Prix du service distingué ACM-SIGPLAN et ont été présentés lors de conférences sur les langages de programmation et l'apprentissage automatique.

Étudiants actuels

Postdoctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Maîtrise recherche - McGill

Publications

Library Learning Doesn’t: The Curious Case of the Single-Use “Library”
Ian Berlot-Attwell
Frank Rudzicz
Advances in Large Language Models (LLMs) have spurred a wave of LLM library learning systems for mathematical reasoning. These systems aim … (voir plus)to learn a reusable library of *tools*, such as formal Isabelle lemmas or Python programs that are tailored to a family of tasks. Many of these systems are inspired by the human structuring of knowledge into reusable and extendable concepts, but do current methods actually learn reusable libraries of tools? We study two library learning systems for mathematics which both reported increased accuracy: LEGO-Prover and TroVE. We find that function reuse is extremely infrequent on miniF2F and MATH. Our followup ablation experiments suggest that, rather than reuse, self-correction and self-consistency are the primary drivers of the observed performance gains.
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
Hao Tang
Keya Hu
Jin Peng Zhou
Si Cheng Zhong
Wei-Long Zheng
Kevin Ellis
Towards Robust Saliency Maps
Nham Le
Arie Gurfinkel
Chuqin Geng
Saliency maps are one of the most popular tools to interpret the operation of a neural network: they compute input features deemed relevant … (voir plus)to the final prediction, which are often subsets of pixels that are easily understandable by a human being. However, it is known that relying solely on human assessment to judge a saliency map method can be misleading. In this work, we propose a new neural network verification specification called saliency-robustness, which aims to use formal methods to prove a relationship between Vanilla Gradient (VG) -- a simple yet surprisingly effective saliency map method -- and the network's prediction: given a network, if an input
Chronosymbolic Learning: Efficient CHC Solving with Symbolic Reasoning and Inductive Learning
Ziyan Luo
Solving Constrained Horn Clauses (CHCs) is a fundamental challenge behind a wide range of verification and analysis tasks. Data-driven appro… (voir plus)aches show great promise in improving CHC solving without the painstaking manual effort of creating and tuning various heuristics. However, a large performance gap exists between data-driven CHC solvers and symbolic reasoning-based solvers. In this work, we develop a simple but effective framework,"Chronosymbolic Learning", which unifies symbolic information and numerical data points to solve a CHC system efficiently. We also present a simple instance of Chronosymbolic Learning with a data-driven learner and a BMC-styled reasoner. Despite its great simplicity, experimental results show the efficacy and robustness of our tool. It outperforms state-of-the-art CHC solvers on a dataset consisting of 288 benchmarks, including many instances with non-linear integer arithmetics.
A Survey on Deep Learning for Theorem Proving
Zhaoyu Li
Jialiang Sun
Logan Murphy
Qidong Su
Zenan Li
Xian Zhang
Kaiyu Yang
Autoformalizing Euclidean Geometry
Logan Murphy
Kaiyu Yang
Jialiang Sun
Zhaoyu Li
Animashree Anandkumar
Autoformalization involves automatically translating informal math into formal theorems and proofs that are machine-verifiable. Euclidean ge… (voir plus)ometry provides an interesting and controllable domain for studying autoformalization. In this paper, we introduce a neuro-symbolic framework for autoformalizing Euclidean geometry, which combines domain knowledge, SMT solvers, and large language models (LLMs). One challenge in Euclidean geometry is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. To address this issue, we use theorem provers to fill in such diagrammatic information automatically, so that the LLM only needs to autoformalize the explicit textual steps, making it easier for the model. We also provide automatic semantic evaluation for autoformalized theorem statements. We construct LeanEuclid, an autoformalization benchmark consisting of problems from Euclid’s Elements and the UniGeo dataset formalized in the Lean proof assistant. Experiments with GPT-4 and GPT-4V show the capability and limitations of state-of-the-art LLMs on autoformalizing geometry problems. The data and code are available at https://github.com/loganrjmurphy/LeanEuclid.
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts
Honghua Dong
Qidong Su
Yubo Gao
Zhaoyu Li
Yangjun Ruan
Gennady G. Pekhimenko
Chris J. Maddison
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration… (voir plus) of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer programs and LLMs, allowing seamless embedding of prompts into Python functions, and vice versa. APPL provides an intuitive and Python-native syntax, an efficient parallelized runtime with asynchronous semantics, and a tracing module supporting effective failure diagnosis and replaying without extra costs. We demonstrate that APPL programs are intuitive, concise, and efficient through three representative scenarios: Chain-of-Thought with self-consistency (CoT-SC), ReAct tool use agent, and multi-agent chat. Experiments on three parallelizable workflows further show that APPL can effectively parallelize independent LLM calls, with a significant speedup ratio that almost matches the estimation.
Learning Minimal NAP Specifications for Neural Network Verification
Chuqin Geng
Zhaoyue Wang
Haolin Ye
Saifei Liao
Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represe… (voir plus)nted as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.
SAT-DIFF: A Tree Diffing Framework Using SAT Solving
Chuqin Geng
Haolin Ye
Yihan Zhang
Brigitte Pientka
Computing differences between tree-structured data is a critical but challenging problem in software analysis. In this paper, we propose a n… (voir plus)ovel tree diffing approach called SatDiff, which reformulates the structural diffing problem into a MaxSAT problem. By encoding the necessary transformations from the source tree to the target tree, SatDiff generates correct, minimal, and type safe low-level edit scripts with formal guarantees. We then synthesize concise high-level edit scripts by effectively merging low-level edits in the appropriate topological order. Our empirical results demonstrate that SatDiff outperforms existing heuristic-based approaches by a significant margin in terms of conciseness while maintaining a reasonable runtime.
Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models
Shujie Deng
Honghua Dong
G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks
Zhaoyu Li
Jinpei Guo
Graph neural networks (GNNs) have recently emerged as a promising approach for solving the Boolean Satisfiability Problem (SAT), offering po… (voir plus)tential alternatives to traditional backtracking or local search SAT solvers. However, despite the growing volume of literature in this field, there remains a notable absence of a unified dataset and a fair benchmark to evaluate and compare existing approaches. To address this crucial gap, we present G4SATBench, the first benchmark study that establishes a comprehensive evaluation framework for GNN-based SAT solvers. In G4SATBench, we meticulously curate a large and diverse set of SAT datasets comprising 7 problems with 3 difficulty levels and benchmark a broad range of GNN models across various prediction tasks, training objectives, and inference algorithms. To explore the learning abilities and comprehend the strengths and limitations of GNN-based SAT solvers, we also compare their solving processes with the heuristics in search-based SAT solvers. Our empirical results provide valuable insights into the performance of GNN-based SAT solvers and further suggest that existing GNN models can effectively learn a solving strategy akin to greedy local search but struggle to learn backtracking search in the latent space.
Scalar Invariant Networks with Zero Bias
Chuqin Geng
Xiaojie Xu
Haolin Ye
Just like weights, bias terms are the learnable parameters of many popular machine learning models, including neural networks. Biases are th… (voir plus)ought to enhance the representational power of neural networks, enabling them to solve a variety of tasks in computer vision. However, we argue that biases can be disregarded for some image-related tasks such as image classification, by considering the intrinsic distribution of images in the input space and desired model properties from first principles. Our findings suggest that zero-bias neural networks can perform comparably to biased networks for practical image classification tasks. We demonstrate that zero-bias neural networks possess a valuable property called scalar (multiplication) invariance. This means that the prediction of the network remains unchanged when the contrast of the input image is altered. We extend scalar invariance to more general cases, enabling formal verification of certain convex regions of the input space. Additionally, we prove that zero-bias neural networks are fair in predicting the zero image. Unlike state-of-the-art models that may exhibit bias toward certain labels, zero-bias networks have uniform belief in all labels. We believe dropping bias terms can be considered as a geometric prior in designing neural network architecture for image classification, which shares the spirit of adapting convolutions as the transnational invariance prior. The robustness and fairness advantages of zero-bias neural networks may also indicate a promising path towards trustworthy and ethical AI.