Publications

Reaction-conditioned De Novo Enzyme Design with GENzyme

Chenqing Hua

Jiarui Lu

Yong Liu

Odin Zhang

Jian Tang

Rex Ying

Wengong Jin

Guy Wolf

Doina Precup

Shuangjia Zheng

The introduction of models like RFDiffusionAA, AlphaFold3, AlphaProteo, and Chai1 has revolutionized protein structure modeling and interact… (voir plus)ion prediction, primarily from a binding perspective, focusing on creating ideal lock-and-key models. However, these methods can fall short for enzyme-substrate interactions, where perfect binding models are rare, and induced fit states are more common. To address this, we shift to a functional perspective for enzyme design, where the enzyme function is defined by the reaction it catalyzes. Here, we introduce \textsc{GENzyme}, a \textit{de novo} enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex. \textsc{GENzyme} is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes. The entire design process is driven by the catalytic reaction being targeted. This reaction-first approach allows for more accurate and biologically relevant enzyme design, potentially surpassing structure-based and binding-focused models in creating enzymes capable of catalyzing specific reactions. We provide \textsc{GENzyme} code at https://github.com/WillHua127/GENzyme.

2024-11-10

ArXiv (prépublication)

doi.org

arxiv.org

Reaction-conditioned De Novo Enzyme Design with GENzyme

Chenqing Hua

Jiarui Lu

Yong Liu

Odin Zhang

Jian Tang

Rex Ying

Wengong Jin

Guy Wolf

Doina Precup

Shuangjia Zheng

The introduction of models like RFDiffusionAA, AlphaFold3, AlphaProteo, and Chai1 has revolutionized protein structure modeling and interact… (voir plus)ion prediction, primarily from a binding perspective, focusing on creating ideal lock-and-key models. However, these methods can fall short for enzyme-substrate interactions, where perfect binding models are rare, and induced fit states are more common. To address this, we shift to a functional perspective for enzyme design, where the enzyme function is defined by the reaction it catalyzes. Here, we introduce \textsc{GENzyme}, a \textit{de novo} enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex. \textsc{GENzyme} is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes. The entire design process is driven by the catalytic reaction being targeted. This reaction-first approach allows for more accurate and biologically relevant enzyme design, potentially surpassing structure-based and binding-focused models in creating enzymes capable of catalyzing specific reactions. We provide \textsc{GENzyme} code at https://github.com/WillHua127/GENzyme.

2024-11-10

ArXiv (prépublication)

arxiv.org

Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study

Mehil B. Shah

Mohammad Masudur Rahman

Foutse Khomh

2024-11-09

Empirical Software Engineering (publié)

doi.org

arxiv.org

Deep Learning Unlocks the True Potential of Organ Donation after Circulatory Death with Accurate Prediction of Time-to-Death

Xingzhi Sun

Edward De Brouwer

Chen Liu

Smita Krishnaswamy

Ramesh Batra

𝟏

Increasing the number of organ donations after circulatory death (DCD) has been identified as one of the most important ways of addressing t… (voir plus)he ongoing organ shortage. While recent technological advances in organ transplantation have increased their success rate, a substantial challenge in increasing the number of DCD donations resides in the uncertainty regarding the timing of cardiac death after terminal extubation, impacting the risk of prolonged ischemic organ injury, and negatively affecting post-transplant outcomes. In this study, we trained and externally validated an ODE-RNN model, which combines recurrent neural network with neural ordinary equations and excels in processing irregularly-sampled time series data. The model is designed to predict time-to-death following terminal extubation in the intensive care unit (ICU) using the last 24 hours of clinical observations. Our model was trained on a cohort of 3,238 patients from Yale New Haven Hospital, and validated on an external cohort of 1,908 patients from six hospitals across Connecticut. The model achieved accuracies of 95.3 {+/-} 1.0% and 95.4 {+/-} 0.7% for predicting whether death would occur in the first 30 and 60 minutes, respectively, with a calibration error of 0.024 {+/-} 0.009. Heart rate, respiratory rate, mean arterial blood pressure (MAP), oxygen saturation (SpO2), and Glasgow Coma Scale (GCS) scores were identified as the most important predictors. Surpassing existing clinical scores, our model sets the stage for reduced organ acquisition costs and improved post-transplant outcomes.

2024-11-08

medRxiv (prépublication)

doi.org

Deep Learning Unlocks the True Potential of Organ Donation after Circulatory Death with Accurate Prediction of Time-to-Death

Xingzhi Sun

Edward De Brouwer

Chen Liu

Smita Krishnaswamy

Ramesh Batra

𝟏

Increasing the number of organ donations after circulatory death (DCD) has been identified as one of the most important ways of addressing t… (voir plus)he ongoing organ shortage. While recent technological advances in organ transplantation have increased their success rate, a substantial challenge in increasing the number of DCD donations resides in the uncertainty regarding the timing of cardiac death after terminal extubation, impacting the risk of prolonged ischemic organ injury, and negatively affecting post-transplant outcomes. In this study, we trained and externally validated an ODE-RNN model, which combines recurrent neural network with neural ordinary equations and excels in processing irregularly-sampled time series data. The model is designed to predict time-to-death following terminal extubation in the intensive care unit (ICU) using the last 24 hours of clinical observations. Our model was trained on a cohort of 3,238 patients from Yale New Haven Hospital, and validated on an external cohort of 1,908 patients from six hospitals across Connecticut. The model achieved accuracies of 95.3 {+/-} 1.0% and 95.4 {+/-} 0.7% for predicting whether death would occur in the first 30 and 60 minutes, respectively, with a calibration error of 0.024 {+/-} 0.009. Heart rate, respiratory rate, mean arterial blood pressure (MAP), oxygen saturation (SpO2), and Glasgow Coma Scale (GCS) scores were identified as the most important predictors. Surpassing existing clinical scores, our model sets the stage for reduced organ acquisition costs and improved post-transplant outcomes.

2024-11-08

medRxiv (prépublication)

doi.org

A new species of Hoplostethus from Sumatra, eastern Indian Ocean, with comments on its most similar congeners (Trachichthyiformes: Trachichthyidae).

Yo Su

Alexander N. Kotlyar

Hsiu-Chin Lin

Toshio Kawai

HSUAN-CHING HO

2024-11-08

Journal of Fish Biology (publié)

doi.org

A new species of Hoplostethus from Sumatra, eastern Indian Ocean, with comments on its most similar congeners (Trachichthyiformes: Trachichthyidae).

Yo Su

Alexander N. Kotlyar

Hsiu-Chin Lin

Toshio Kawai

HSUAN-CHING HO

2024-11-08

Journal of Fish Biology (publié)

doi.org

Optimal Approximate Minimization of One-Letter Weighted Finite Automata

Clara Lacroce

Borja Balle

Prakash Panangaden

Guillaume Rabusseau

2024-11-08

Mathematical Structures in Computer Science (publié)

doi.org

arxiv.org

Robustness of Neural Ratio and Posterior Estimators to Distributional Shifts for Population-Level Dark Matter Analysis in Strong Gravitational Lensing

Andreas Filipp

Yashar Hezaveh

Laurence Perreault-Levasseur

2024-11-08

ArXiv (prépublication)

arxiv.org

A Guide to Misinformation Detection Data and Evaluation

Camille Thibault

Jacob-Junqi Tian

Gabrielle Péloquin-Skulski

Taylor Lynn Curtis

James Zhou

Florence Laflamme

Yuxiang Guan

Reihaneh Rabbany

Jean-François Godbout

Kellin Pelrine

2024-11-07

ArXiv (prépublication)

arxiv.org

Solving Hidden Monotone Variational Inequalities with Surrogate Losses

Ryan D'Orazio

Danilo Vucetic

Zichu Liu

Junhyung Lyle Kim

Ioannis Mitliagkas

Gauthier Gidel

Deep learning has proven to be effective in a wide variety of loss minimization problems. However, many applications of interest, like minim… (voir plus)izing projected Bellman error and min-max optimization, cannot be modelled as minimizing a scalar loss function but instead correspond to solving a variational inequality (VI) problem. This difference in setting has caused many practical challenges as naive gradient-based approaches from supervised learning tend to diverge and cycle in the VI case. In this work, we propose a principled surrogate-based approach compatible with deep learning to solve VIs. We show that our surrogate-based approach has three main benefits: (1) under assumptions that are realistic in practice (when hidden monotone structure is present, interpolation, and sufficient optimization of the surrogates), it guarantees convergence, (2) it provides a unifying perspective of existing methods, and (3) is amenable to existing deep learning optimizers like ADAM. Experimentally, we demonstrate our surrogate-based approach is effective in min-max optimization and minimizing projected Bellman error. Furthermore, in the deep reinforcement learning case, we propose a novel variant of TD(0) which is more compute and sample efficient.

2024-11-07

ArXiv (prépublication)

doi.org

arxiv.org

Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method

Teodora Baluta

Pascal Lamblin

Daniel Tarlow

Fabian Pedregosa

Gintare Karolina Dziugaite

Machine unlearning aims to solve the problem of removing the influence of selected training examples from a learned model. Despite the incre… (voir plus)asing attention to this problem, it remains an open research question how to evaluate unlearning in large language models (LLMs), and what are the critical properties of the data to be unlearned that affect the quality and efficiency of unlearning. This work formalizes a metric to evaluate unlearning quality in generative models, and uses it to assess the trade-offs between unlearning quality and performance. We demonstrate that unlearning out-of-distribution examples requires more unlearning steps but overall presents a better trade-off overall. For in-distribution examples, however, we observe a rapid decay in performance as unlearning progresses. We further evaluate how example's memorization and difficulty affect unlearning under a classical gradient ascent-based approach.

2024-11-07

ArXiv (prépublication)

doi.org

arxiv.org

Avantage IA

Bourse Mila en politiques de l'IA

Priorités stratégiques

Avantage IA

Bourse Mila en politiques de l'IA

Publications

Avantage IA

Bourse Mila en politiques de l'IA

Priorités stratégiques

Avantage IA

Bourse Mila en politiques de l'IA

Mots-clés populaires:

Publications