Publications

PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Chuanrui Wang
Bozitao Zhong
Zuobai Zhang
Narendra Chaudhary
Sanchit Misra
Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a univers… (see more)ally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the
PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Chuanrui Wang
Bozitao Zhong
Zuobai Zhang
Narendra Chaudhary
Sanchit Misra
Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a univers… (see more)ally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the
Role of Structural and Conformational Diversity for Machine Learning Potentials
Nikhil Shenoy
Prudencio Tossou
Emmanuel Noutahi
Hadrien Mary
Jiarui Ding
In the field of Machine Learning Interatomic Potentials (MLIPs), understanding the intricate relationship between data biases, specifically … (see more)conformational and structural diversity, and model generalization is critical in improving the quality of Quantum Mechanics (QM) data generation efforts. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal nuanced patterns in generalization metrics. Notably, for optimal structural and conformational generalization, a careful balance between structural and conformational diversity is required, but existing QM datasets do not meet that trade-off. Additionally, our results highlight the limitation of the MLIP models at generalizing beyond their training distribution, emphasizing the importance of defining applicability domain during model deployment. These findings provide valuable insights and guidelines for QM data generation efforts.
Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms
Michael Perlmutter
Alexander Tong
Feng Gao
Matthew Hirn
The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. R… (see more)ecently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.
Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles
Xing Shen
Hengguan Huang
Brennan Nichyporuk
Validation of ANG-1 and P-SEL as biomarkers of post-COVID-19 conditions using data from the Biobanque québécoise de la COVID-19 (BQC-19)
Eric Yamga
Antoine Soulé
Alain Piché
Madeleine Durand
Simon Rousseau
Causal machine learning for single-cell genomics
Alejandro Tejada-Lapuerta
Paul Bertin
Stefan Bauer
Hananeh Aliee
Fabian J. Theis
Distributional Robustness and Inequity Mitigation in Disaster Preparedness of Humanitarian Operations
Hongming Li
Ning Zhu
Michael Pinedo
Shoufeng Ma
Problem definition: In this paper, we study a predisaster relief network design problem with uncertain demands. The aim is to determine the … (see more)prepositioning and reallocation of relief supplies. Motivated by the call of the International Federation of Red Cross and Red Crescent Societies (IFRC) to leave no one behind, we consider three important practical aspects of humanitarian operations: shortages, equity, and uncertainty. Methodology/results: We first employ a form of robust satisficing measure, which we call the shortage severity measure, to evaluate the severity of the shortage caused by uncertain demand in a context with limited distribution information. Because shortages often raise concerns about equity, we then formulate a mixed-integer lexicographic optimization problem with nonconvex objectives and design a new branch-and-bound algorithm to identify the exact solution. We also propose two approaches for identifying optimal postdisaster adaptable resource reallocation: an exact approach and a conservative approximation that is more computationally efficient. Our case study considers the 2010 Yushu earthquake, which occurred in northwestern China, and demonstrates the value of our methodology in mitigating geographical inequities and reducing shortages. Managerial implications: In our case study, we show that (i) incorporating equity in both predisaster deployment and postdisaster reallocation can produce substantially more equitable shortage prevention strategies while sacrificing only a reasonable amount of total shortage; (ii) increasing donations/budgets may not necessarily alleviate the shortage suffered by the most vulnerable individuals if equity is not fully considered; and (iii) exploiting disaster magnitude information when quantifying uncertainty can help alleviate geographical inequities caused by uncertain relief demands. Funding: This work was supported by the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2016-05208], the National Natural Science Foundation of China [Grants 71971154, 72010107004, 72091214, and 72122015], and the Canada Research Chairs [Grant CRC-2018-00105]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/msom.2023.1230 .
Dosimetry of [18F]TRACK, the first PET tracer for imaging of TrkB/C receptors in humans
Alexander Thiel
Alexey Kostikov
Hailey Ahn
Youstina Daoud
Jean-Paul Soucy
Stephan Blinder
Carolin Jaworski
Carmen Wängler
Björn Wängler
Freimut Juengling
Ralf Schirrmacher
Ghost on the Shell: An Expressive Representation of General 3D Shapes
Zhen Liu
Yao Feng
Yuliang Xiu
Weiyang Liu
Michael J. Black
Bernhard Schölkopf
Gradient Masked Averaging for Federated Learning
Irene Tenison
Sai Aravind Sreeramadas
Vaikkunth Mugunthan
Edouard Oyallon
Federated learning (FL) is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a u… (see more)nified global model without the need to share data amongst each other. A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms. Standard FL algorithms involve averaging of model parameters or gradient updates to approximate the global model at the server. However, we argue that in heterogeneous settings, averaging can result in information loss and lead to poor generalization due to the bias induced by dominant client gradients. We hypothesize that to generalize better across non-i.i.d datasets, the algorithms should focus on learning the invariant mechanism that is constant while ignoring spurious mechanisms that differ across clients. Inspired from recent works in Out-of-Distribution generalization, we propose a gradient masked averaging approach for FL as an alternative to the standard averaging of client updates. This aggregation technique for client updates can be adapted as a drop-in replacement in most existing federated algorithms. We perform extensive experiments on multiple FL algorithms with in-distribution, real-world, feature-skewed out-of-distribution, and quantity imbalanced datasets and show that it provides consistent improvements, particularly in the case of heterogeneous clients.
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Alan Chan
Benjamin Bucknall
Herbie Bradley