Publications

PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design

Chuanrui Wang

Bozitao Zhong

Zuobai Zhang

Narendra Chaudhary

Sanchit Misra

Jian Tang

Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a univers… (see more)ally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the

2023-10-25

NeurIPS.cc/2023/Workshop/AI4D3 (poster)

Role of Structural and Conformational Diversity for Machine Learning Potentials

Nikhil Shenoy

Prudencio Tossou

Emmanuel Noutahi

Hadrien Mary

Dominique Beaini

Jiarui Ding

In the field of Machine Learning Interatomic Potentials (MLIPs), understanding the intricate relationship between data biases, specifically … (see more)conformational and structural diversity, and model generalization is critical in improving the quality of Quantum Mechanics (QM) data generation efforts. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal nuanced patterns in generalization metrics. Notably, for optimal structural and conformational generalization, a careful balance between structural and conformational diversity is required, but existing QM datasets do not meet that trade-off. Additionally, our results highlight the limitation of the MLIP models at generalizing beyond their training distribution, emphasizing the importance of defining applicability domain during model deployment. These findings provide valuable insights and guidelines for QM data generation efforts.

2023-10-25

NeurIPS.cc/2023/Workshop/AI4D3 (poster)

Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms

Michael Perlmutter

Alexander Tong

Feng Gao

Guy Wolf

Matthew Hirn

The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. R… (see more)ecently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.

2023-10-25

SIAM Journal on Mathematics of Data Science (published)

Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles

Xing Shen

Hengguan Huang

Brennan Nichyporuk

Tal Arbel

2023-10-24

ArXiv (preprint)

Validation of ANG-1 and P-SEL as biomarkers of post-COVID-19 conditions using data from the Biobanque québécoise de la COVID-19 (BQC-19)

Eric Yamga

Antoine Soulé

Alain Piché

Amin Emad

Madeleine Durand

Simon Rousseau

2023-10-24

Clinical Proteomics (published)

Causal machine learning for single-cell genomics

Alejandro Tejada-Lapuerta

Paul Bertin

Stefan Bauer

Hananeh Aliee

Yoshua Bengio

Fabian J. Theis

2023-10-23

ArXiv (preprint)

Distributional Robustness and Inequity Mitigation in Disaster Preparedness of Humanitarian Operations

Hongming Li

Érick Delage

Ning Zhu

Michael Pinedo

Shoufeng Ma

Problem definition: In this paper, we study a predisaster relief network design problem with uncertain demands. The aim is to determine the … (see more)prepositioning and reallocation of relief supplies. Motivated by the call of the International Federation of Red Cross and Red Crescent Societies (IFRC) to leave no one behind, we consider three important practical aspects of humanitarian operations: shortages, equity, and uncertainty. Methodology/results: We first employ a form of robust satisficing measure, which we call the shortage severity measure, to evaluate the severity of the shortage caused by uncertain demand in a context with limited distribution information. Because shortages often raise concerns about equity, we then formulate a mixed-integer lexicographic optimization problem with nonconvex objectives and design a new branch-and-bound algorithm to identify the exact solution. We also propose two approaches for identifying optimal postdisaster adaptable resource reallocation: an exact approach and a conservative approximation that is more computationally efficient. Our case study considers the 2010 Yushu earthquake, which occurred in northwestern China, and demonstrates the value of our methodology in mitigating geographical inequities and reducing shortages. Managerial implications: In our case study, we show that (i) incorporating equity in both predisaster deployment and postdisaster reallocation can produce substantially more equitable shortage prevention strategies while sacrificing only a reasonable amount of total shortage; (ii) increasing donations/budgets may not necessarily alleviate the shortage suffered by the most vulnerable individuals if equity is not fully considered; and (iii) exploiting disaster magnitude information when quantifying uncertainty can help alleviate geographical inequities caused by uncertain relief demands. Funding: This work was supported by the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2016-05208], the National Natural Science Foundation of China [Grants 71971154, 72010107004, 72091214, and 72122015], and the Canada Research Chairs [Grant CRC-2018-00105]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/msom.2023.1230 .

2023-10-23

Manufacturing & Service Operations Management (published)

Dosimetry of [18F]TRACK, the first PET tracer for imaging of TrkB/C receptors in humans

Alexander Thiel

Alexey Kostikov

Hailey Ahn

Youstina Daoud

Jean-Paul Soucy

Stephan Blinder

Carolin Jaworski

Carmen Wängler

Björn Wängler

Freimut Juengling

Shirin A. Enger

Ralf Schirrmacher

2023-10-23

EJNMMI Radiopharmacy and Chemistry (published)

Ghost on the Shell: An Expressive Representation of General 3D Shapes

Zhen Liu

Yao Feng

Yuliang Xiu

Weiyang Liu

Liam Paull

Michael J. Black

Bernhard Schölkopf

2023-10-23

ArXiv (preprint)

Gradient Masked Averaging for Federated Learning

Irene Tenison

Sai Aravind Sreeramadas

Vaikkunth Mugunthan

Edouard Oyallon

Eugene Belilovsky

Irina Rish

Federated learning (FL) is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a u… (see more)nified global model without the need to share data amongst each other. A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms. Standard FL algorithms involve averaging of model parameters or gradient updates to approximate the global model at the server. However, we argue that in heterogeneous settings, averaging can result in information loss and lead to poor generalization due to the bias induced by dominant client gradients. We hypothesize that to generalize better across non-i.i.d datasets, the algorithms should focus on learning the invariant mechanism that is constant while ignoring spurious mechanisms that differ across clients. Inspired from recent works in Out-of-Distribution generalization, we propose a gradient masked averaging approach for FL as an alternative to the standard averaging of client updates. This aggregation technique for client updates can be adapted as a drop-in replacement in most existing federated algorithms. We perform extensive experiments on multiple FL algorithms with in-distribution, real-world, feature-skewed out-of-distribution, and quantity imbalanced datasets and show that it provides consistent improvements, particularly in the case of heterogeneous clients.

2023-10-23

TMLR (accepted)

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Alan Chan

Benjamin Bucknall

Herbie Bradley

David Scott Krueger

2023-10-23

NeurIPS.cc/2023/Workshop/SoLaR (spotlight)