Publications

PAC-X: Fuzzy Explainable AI for Multi-Class Malware Detection
Mohd Saqib
Benjamin C. M. Fung
Philippe Charland
PheCode-guided multi-modal topic modeling of electronic health records improves disease incidence prediction and GWAS discovery from UK Biobank
Ziqi Yang
Ziyang Song
Phenome-wide association studies rely on disease definitions derived from diagnostic codes, often failing to leverage the full richness of e… (see more)lectronic health records (EHR). We present MixEHR-SAGE, a PheCode-guided multi-modal topic model that integrates diagnoses, procedures, and medications to enhance phenotyping from large-scale EHRs. By combining expert-informed priors with probabilistic inference, MixEHR-SAGE identifies over 1000 interpretable phenotype topics from UK Biobank data. Applied to 350 000 individuals with high-quality genetic data, MixEHR-SAGE-derived risk scores accurately predict incident type 2 diabetes (T2D) and leukemia diagnoses. Subsequent genome-wide association studies using these continuous risk scores uncovered novel disease-associated loci, including PPP1R15A for T2D and JMJD6/SRSF2 for leukemia, that were missed by traditional binary case definitions. These results highlight the potential of probabilistic phenotyping from multi-modal EHRs to improve genetic discovery. The MixEHR-SAGE software is publicly available at: https://github.com/li-lab-mcgill/MixEHR-SAGE.
Piezoelectric tuning of thermal conductivity in nano-architected gallium nitride metamaterials
Jun Cai
Alireza Seyedkanani
Benyamin Shahryari
Abdolhamid Akbarzadeh
Practical Solutions to Volt-var Optimization under Uncertainty via Blackbox Optimization
In this work, we propose an optimal reactive power dispatch (ORPD) stochastic program for volt-var optimization (VVO) of power distribution … (see more)networks. The formulation considers not only control settings of conventional VVO devices, e.g., voltage regulators, capacitor banks, and on-load tap changers, but also optimal settings for volt-var droop curves of distributed energy resources (DERs), compliant with the IEEE 1547-2018 standard. Instead of including the power flow equations in the optimization problem which makes it nonlinear and nonconvex, a power flow solver is utilized and the problem is solved by blackbox optimization (BBO). The feasibility of the derived solution is improved by using unbalanced power flow simulations. The solution is effective under various demand and DER generation scenarios such that device settings are not frequently changed, making it practical for in-field implementations. Through numerical simulations on IEEE test feeders, we illustrate the performance of the solutions of our proposed approach on both in-sample and out-of-sample scenarios. We show that our approach outperforms a benchmark reinforcement learning method, and is also scalable to large-scale distribution networks.
Press Start to Charge: Videogaming the Online Centralized Charging Scheduling Problem
Alireza Ghahtarani
Martin Cousineau
Jorge E. Mendoza
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment
Junqi Liu
Xiaoyang Hou
Xin Liu
Zhi Yang
Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, dev… (see more)elopability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbone, the resulting model MoMPNN enhances developability without compromising designability across tasks including sequence design for CATH 4.3 crystal structures, de novo generated backbones, and real-world binder design scenarios, making it an appealing framework for practical protein sequence design.
RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference
Lianming Huang
Shangyu Wu
Yufei Cui
Ying Xiong
Haibo Hu
Xue Liu
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
Deploying large language model inference remains challenging due to their high computational overhead. Early exit optimizes model inference … (see more)by adaptively reducing the number of inference layers. Current methods typically train internal classifiers or use heuristic methods to determine the exit layer. However, those methods either introduce significant training overheads or lead to performance degradation. To address these limitations, this paper proposes RAEE, a robust Retrieval-Augmented Early Exit framework that not only enables early exit but also enhances model performance through corrective exit information at intermediate layers. This paper first demonstrates that the early exit problem can be effectively modeled as a distribution prediction problem, in which the distribution can be further approximated through the exit information of similar data. Subsequently, this paper introduces the process of collecting exit information of correct predictions and the steps to construct the retrieval database. Finally, leveraging the pre-constructed retrieval database, RAEE utilizes the exit information from retrieved similar data to guide the backbone model's exit. Experimental results demonstrate that RAEE can not only accelerate inference while achieving robust zero-shot performance across eight downstream tasks.
Recall, Robustness, and Lexicographic Evaluation
Michael D. Ekstrand
Bhaskar Mitra
Reply to comment on "medication-based mortality prediction in COPD using machine learning and conventional statistical methods".
Ana Paula Pena-Gralle
Amélie Forget
Yohann Moanahere Chiu
M. Beauchesne
Lucie Blais
RetINaBox: A Hands-On Learning Tool for Experimental Neuroscience
Brune Bettler
Flavia Arias Armas
Vanessa Bordonaro
Megan Q. Liu
Mingyu Wan
Aude Villemain
Blake A. Richards
Stuart Trenholm
An exciting aspect of neuroscience is developing and testing hypotheses via experimentation. However, due to logistical and financial hurdle… (see more)s, the experiment and discovery component of neuroscience is generally lacking in classroom and outreach settings. To address this issue, here we introduce RetINaBox: a low-cost open–source electronic visual system simulator that provides users with a hands-on tool to discover how the visual system builds feature detectors. RetINaBox includes an LED array for generating visual stimuli and photodiodes that act as an array of model photoreceptors. Custom software on a Raspberry Pi computer reads out responses from model photoreceptors and allows users to control the polarity and delay of the signal transfer from model photoreceptors to model retinal ganglion cells. Interactive lesson plans are provided, guiding users to discover different types of visual feature detectors—including ON/OFF, center-surround, orientation-selective, and direction-selective receptive fields—as well as their underlying circuit computations.
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
Fine-tuning pretrained models is a standard and effective workflow in modern machine learning. However, robust fine-tuning (RFT), which aims… (see more) to simultaneously achieve adaptation to a downstream task and robustness to adversarial examples, remains challenging. Despite the abundance of non-robust pretrained models in open-source repositories, their potential for RFT is less understood. We address this knowledge gap by systematically examining RFT from such non-robust models. Our experiments reveal that fine-tuning non-robust models with a robust objective, even under small perturbations, can lead to poor performance, a phenomenon that we dub _suboptimal transfer_. In challenging scenarios (eg, difficult tasks, high perturbation), the resulting performance can be so low that it may be considered a transfer failure. We find that fine-tuning using a robust objective impedes task adaptation at the beginning of training and eventually prevents optimal transfer. However, we propose a novel heuristic, _Epsilon-Scheduling_, a schedule over perturbation strength used during training that promotes optimal transfer. Additionally, we introduce _expected robustness_, a metric that captures performance across a range of perturbations, providing a more comprehensive evaluation of the accuracy-robustness trade-off of diverse models at test-time. Extensive experiments on wide range of configurations (six pretrained models and five datasets) show that _Epsilon-Scheduling_ successfully prevents _suboptimal transfer_ and consistently improves expected robustness.
Scalable Tree Ensemble Proximities in Python
Kevin R. Moon
Jake S. Rhodes