Arian Rokkum Jamasb

Evaluating Representation Learning on the Protein Structure Universe

Arian Rokkum Jamasb

Alex Morehead

Chaitanya K. Joshi

Zuobai Zhang

Kieran Didi

Simon V Mathis

Charles Harris

Jian Tang

Jianlin Cheng

Pietro Lio

Tom Leon Blundell

2024-01-16

ICLR.cc/2024/Conference (poster)

openreview.net

GAUCHE: A Library for Gaussian Processes in Chemistry

Ryan-Rhys Griffiths

Leo Klarner

Henry Moss

Aditya Ravuri

Sang T. Truong

Bojana Rankovic

Samuel Don Stanton

Yuanqi Du

Arian Rokkum Jamasb

Gary Tom

Julius Schwartz

Austin Tripp

Aryan Deshwal

Gregory Kell

Anthony Bourached

Alex James Chan

Jacob Moss

Chengzhi Guo

Simon Frieder

Alpha Lee … (see 8 more)

Philippe Schwaller

Jian Tang

Johannes P. Dürholt

Saudamini Chaurasia

Ji Won Park

Felix Strieth-Kalthoff

Bingqing Cheng

Alan Aspuru-Guzik

We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine… (see more) learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations however is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche

openreview.net

Protein Representation Learning by Geometric Structure Pretraining

Zuobai Zhang

Minghao Xu

Arian Rokkum Jamasb

Vijil Chenthamarakshan

Aurelie Lozano

Payel Das

Jian Tang

Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Ex… (see more)isting approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences and then finetune the models with some labeled data in downstream tasks. Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function. In this paper, we propose to pretrain protein representations according to their 3D structures. We first present a simple yet effective encoder to learn the geometric features of a protein. We pretrain the protein graph encoder by leveraging multiview contrastive learning and different self-prediction tasks. Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods, while using much less pretraining data. Our implementation is available at https://github.com/DeepGraphLearning/GearNet.

2023-02-01

ICLR.cc/2023/Conference (poster)

doi.org

openreview.net

Speed Science

Leading in a New Era

Supervision Requests

Arian Rokkum Jamasb

Publications

Speed Science

Leading in a New Era

Supervision Requests

Popular keywords:

Arian Rokkum Jamasb

Publications