Portrait of Xue (Steve) Liu is unavailable

Xue (Steve) Liu

Associate Academic Member
Full Professor, McGill University, School of Computer Science
Vice President Research and Development, Chief Scientist and Co-Director, Samsung's Montreal AI Center
Research Topics
Deep Learning

Biography

Xue (Steve) Liu is an associate academic member of Mila – Quebec Artificial Intelligence Institute and full professor at McGill University’s School of Computer Science.

He is also a William Dawson Scholar at McGill, as well as a professor (courtesy appointment) in the Department of Mathematics and Statistics, associate member of the Centre for Intelligent Machines (CIM), and associate member of the Centre for Advanced Systems and Technologies in Communications (SYTACom).

Liu is VP of R&D, chief scientist and co-director of Samsung AI Center Montréal. Before that, he was chief scientist in charge of research and innovation at Tinder Inc., the world’s largest dating and social discovery app, then valued at over US$10 billion.

He is a Fellow of the IEEE and the Canadian Academy of Engineering in addition to being the recipient of many awards, including the 2017 Mitacs Award for Exceptional Leadership – Professor; Outstanding Young Canadian Computer Science Researcher Prize from the Canadian Association of Computer Science (2014); and McGill’s Tomlinson Scientist Award for “recognition of excellence and scientific leadership.” He founded McGill’s Cyber-Physical Intelligence Lab in 2007 and still serves as its director.

Liu also briefly served as Samuel R. Thompson Chair Associate Professor in the Department of Computer Science and Engineering at the University of Nebraska-Lincoln, and worked at Hewlett-Packard Labs in Palo Alto (California) and IBM’s Thomas J. Watson Research Center (New York)

Current Students

Master's Research - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
Master's Research - McGill University
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
Postdoctorate - McGill University
Co-supervisor :
PhD - McGill University

Publications

Importance-aware Co-teaching for Offline Model-based Optimization
Ye Yuan
Can Chen
Zixuan Liu
Willie Neiswanger
Offline model-based optimization aims to find a design that maximizes a property of interest using only an offline dataset, with application… (see more)s in robot, protein, and molecule design, among others. A prevalent approach is gradient ascent, where a proxy model is trained on the offline dataset and then used to optimize the design. This method suffers from an out-of-distribution issue, where the proxy is not accurate for unseen designs. To mitigate this issue, we explore using a pseudo-labeler to generate valuable data for fine-tuning the proxy. Specifically, we propose
Importance-aware Co-teaching for Offline Model-based Optimization
Ye Yuan
Can Chen
Zixuan Liu
Willie Neiswanger
Parallel-mentoring for Offline Model-based Optimization
Can Chen
Christopher Beckham
Zixuan Liu
We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These desi… (see more)gns encompass a variety of domains, including materials, robots, DNA sequences, and proteins. A common approach trains a proxy on the static dataset and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose
Parallel-mentoring for Offline Model-based Optimization
Can Chen
Christopher Beckham
Zixuan Liu
Retrieval-Augmented Multiple Instance Learning
Yufei Cui
Ziquan Liu
Yixin CHEN
Yuchen Lu
Xinyue Yu
Tei-Wei Kuo
Miguel R. D. Rodrigues
Chun Jason Xue
Antoni B. Chan
Multiple Instance Learning (MIL) is a crucial weakly supervised learning method applied across various domains, e.g., medical diagnosis base… (see more)d on whole slide images (WSIs). Recent advancements in MIL algorithms have yielded exceptional performance when the training and test data originate from the same domain, such as WSIs obtained from the same hospital. However, this paper reveals a performance deterioration of MIL models when tested on an out-of-domain test set, exemplified by WSIs sourced from a novel hospital. To address this challenge, this paper introduces the Retrieval-AugMented MIL (RAM-MIL) framework, which integrates Optimal Transport (OT) as the distance metric for nearest neighbor retrieval. The development of RAM-MIL is driven by two key insights. First, a theoretical discovery indicates that reducing the input's intrinsic dimension can minimize the approximation error in attention-based MIL. Second, previous studies highlight a link between input intrinsic dimension and the feature merging process with the retrieved data. Empirical evaluations conducted on WSI classification demonstrate that the proposed RAM-MIL framework achieves state-of-the-art performance in both in-domain scenarios, where the training and retrieval data are in the same domain, and more crucially, in out-of-domain scenarios, where the (unlabeled) retrieval data originates from a different domain. Furthermore, the use of the transportation matrix derived from OT renders the retrieval results interpretable at the instance level, in contrast to the vanilla
Retrieval-Augmented Multiple Instance Learning
Yufei Cui
Ziquan Liu
Yixin CHEN
Yuchen Lu
Xinyue Yu
Tei-Wei Kuo
Miguel R. D. Rodrigues
Chun Jason Xue
Antoni B. Chan
Multiple Instance Learning (MIL) is a crucial weakly supervised learning method applied across various domains, e.g., medical diagnosis base… (see more)d on whole slide images (WSIs). Recent advancements in MIL algorithms have yielded exceptional performance when the training and test data originate from the same domain, such as WSIs obtained from the same hospital. However, this paper reveals a performance deterioration of MIL models when tested on an out-of-domain test set, exemplified by WSIs sourced from a novel hospital. To address this challenge, this paper introduces the Retrieval-AugMented MIL (RAM-MIL) framework, which integrates Optimal Transport (OT) as the distance metric for nearest neighbor retrieval. The development of RAM-MIL is driven by two key insights. First, a theoretical discovery indicates that reducing the input's intrinsic dimension can minimize the approximation error in attention-based MIL. Second, previous studies highlight a link between input intrinsic dimension and the feature merging process with the retrieved data. Empirical evaluations conducted on WSI classification demonstrate that the proposed RAM-MIL framework achieves state-of-the-art performance in both in-domain scenarios, where the training and retrieval data are in the same domain, and more crucially, in out-of-domain scenarios, where the (unlabeled) retrieval data originates from a different domain. Furthermore, the use of the transportation matrix derived from OT renders the retrieval results interpretable at the instance level, in contrast to the vanilla
Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network
Fuyuan Lyu
Xing Tang
Dugang Liu
Chen Ma
Weihong Luo
Liang Chen
xiuqiang He
Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network
Fuyuan Lyu
Xing Tang
Dugang Liu
Chen Ma
Weihong Luo
Liang Chen
xiuqiang He
Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, wi… (see more)th which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-grained feature interaction selection approach that targets both feature field and feature value for deep sparse networks. To explore such expansive space, we propose a decomposed space which is calculated on the fly. We then develop a selection algorithm called OptFeature, which efficiently selects the feature interaction from both the feature field and the feature value simultaneously. Results from experiments on three large real-world benchmark datasets demonstrate that OptFeature performs well in terms of accuracy and efficiency. Additional studies support the feasibility of our method. All source code are publicly available\footnote{https://anonymous.4open.science/r/OptFeature-Anonymous}.
Teacher-Student Architecture for Knowledge Distillation: A Survey
Chengming Hu
Xuan Li
Danyang Liu
Haolun Wu
Xi Chen
Ju Wang
Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be depl… (see more)oyed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.
Variational Nested Dropout
Yufei Cui
Yu Mao
Ziquan Liu
Qiao Li
Antoni B. Chan
Tei-Wei Kuo
Chun Jason Xue
Nested dropout is a variant of dropout operation that is able to order network parameters or features based on the pre-defined importance du… (see more)ring training. It has been explored for: I. Constructing nested nets Cui et al. 2020, Cui et al. 2021: the nested nets are neural networks whose architectures can be adjusted instantly during testing time, e.g., based on computational constraints. The nested dropout implicitly ranks the network parameters, generating a set of sub-networks such that any smaller sub-network forms the basis of a larger one. II. Learning ordered representation Rippel et al. 2014: the nested dropout applied to the latent representation of a generative model (e.g., auto-encoder) ranks the features, enforcing explicit order of the dense representation over dimensions. However, the dropout rate is fixed as a hyper-parameter during the whole training process. For nested nets, when network parameters are removed, the performance decays in a human-specified trajectory rather than in a trajectory learned from data. For generative models, the importance of features is specified as a constant vector, restraining the flexibility of representation learning. To address the problem, we focus on the probabilistic counterpart of the nested dropout. We propose a variational nested dropout (VND) operation that draws samples of multi-dimensional ordered masks at a low cost, providing useful gradients to the parameters of nested dropout. Based on this approach, we design a Bayesian nested neural network that learns the order knowledge of the parameter distributions. We further exploit the VND under different generative models for learning ordered latent distributions. In experiments, we show that the proposed approach outperforms the nested network in terms of accuracy, calibration, and out-of-domain detection in classification tasks. It also outperforms the related generative models on data generation tasks.
Bidirectional Learning for Offline Model-based Biological Sequence Design
Can Chen
Yingxue Zhang
Reinforcement Learning-Based Adaptive Feature Boosting for Smart Grid Intrusion Detection
Chengming Hu
Jun Yan
Intrusion detection systems (IDSs) are crucial in the security monitoring for the smart grid with increasing machine-to-machine communicatio… (see more)ns and cyber threats thereafter. However, the multi-sourced, correlated, and heterogeneous smart grid data pose significant challenges to the accurate attack detection by IDSs. To improve the attack detection, this paper proposes Reinforcement Learning-based Adaptive Feature Boosting, which aims to leverage a series of AutoEncoders (AEs) to capture critical features from the multi-sourced smart grid data for the classification of normal, fault, and attack events. Multiple AEs are utilized to extract representative features from different feature sets that are automatically generated through a weighted feature sampling process; each AE-extracted feature set is then applied to build a Random Forest (RF) base classifier. In the feature sampling process, Deep Deterministic Policy Gradient (DDPG) is introduced to dynamically determine the feature sampling probability based on the classification accuracy. The critical features that improve the classification accuracy are assigned larger sampling probabilities and increasingly participate in the training of next AE. The presence of critical features is increased in the event classification over the multi-sourced smart grid data. Considering potential different alarms among base classifiers, an ensemble classifier is further built to distinguish normal, fault, and attack events. Our proposed approach is evaluated on the two realistic datasets collected from Hardware-In-the-Loop (HIL) and WUSTIL-IIOT-2021 security testbeds, respectively. The evaluation on the HIL security dataset shows that our proposed approach achieves the classification accuracy with 97.28%, an effective 5.5% increase over the vanilla Adaptive Feature Boosting. Moreover, the proposed approach not only accurately and stably selects critical features on the WUSTIL-IIOT-2021 dataset based on the significant difference of feature sampling probabilities between critical and uncritical features, i.e., the probabilities greater than 0.08 and less than 0.01, but also outperforms the other best-performing approaches with the increasing Matthew Correlation Coefficient (MCC) of 8.03%.