Portrait of Weihua Shi is unavailable

Weihua Shi

PhD - McGill University
Supervisor
Research Topics
Deep Learning
Optimization
Reinforcement Learning

Publications

Beyond Go/No-Go Decisions: A Regional Selection Framework for Uncertainty-Aware Molecule Screening
Tian Bai
Kaiqiong Zhao
Marc‐André Legault
Hui Peng
Yue Zhao
Eric D. Kolaczyk
Xiang Yu
Archer Y. Yang
In drug discovery, quantitative structure–activity relationship (QSAR) models are widely used to guide Go/No-Go decisions within the Desig… (see more)n–Make–Test–Analyze (DMTA) cycle. However, conventional decision heuristics typically rely on a single cutoff, leading to a rigid binary select/discard paradigm. This approach is particularly ill-suited for borderline compounds near the decision boundary, where screening decisions are especially sensitive to prediction uncertainty and premature choices may either discard viable leads or advance likely failures, thereby increasing downstream assay costs. To address this limitation, we propose Regional Selection (RS), an uncertainty-aware three-way decision framework that partitions compounds into Predicted Pass, Predicted Fail, and Predicted Indeterminate regions. By explicitly reserving high-uncertainty compounds for targeted follow-up, RS avoids the pitfalls of premature binary classification. We formalize this framework through Regional Selection Inference (RSI), which casts region assignment as a multiple-hypothesis testing problem. We develop two imple- mentations of RSI: an empirical calibration-based method (RSI-EC), which thresholds uncertainty-normalized scores via empirical calibration, and a conformal selectionbased method (RSI-CS), which constructs conformal p-values for region assignment. RSI-EC is supported by large-sample calibration arguments, whereas RSI-CS provides finite-sample, distribution-free guarantees under exchangeability. Extensive evaluations across 15 high-dimensional QSAR benchmarks show that both RSI procedures reliably control the false discovery rate while maintaining high screening power. In limited-data regimes, RSI-CS yields particularly stable FDR control, whereas RSI-EC can be slightly less conservative; both perform strongly as sample sizes increase. We further study a cost-aware extension that incorporates asymmetric downstream costs through the score construction while keeping the nominal FDR target fixed. This extension introduces a tuning parameter that can reduce realized downstream cost, with dataset-dependent trade-offs against screening power. Overall, RSI offers a mathematically grounded and resource-aware alternative to single-threshold screening, allowing discovery teams to better balance decision confidence with assay budgets.