Publications

Enhancing Protein Language Model with Structure-based Encoder and Pre-training

Zuobai Zhang

Minghao Xu

Aurelie Lozano

Vijil Chenthamarakshan

Payel Das

Protein language models (PLMs) pre-trained on large-scale protein sequence corpora have achieved impressive performance on various downstrea… (see more)m protein understanding tasks. Despite the ability to implicitly capture inter-residue contact information, transformer-based PLMs cannot encode protein structures explicitly for better structure-aware protein representations. Besides, the power of pre-training on available protein structures has not been explored for improving these PLMs, though structures are important to determine functions. To tackle these limitations, in this work, we enhance the PLM with structure-based encoder and pre-training. We first explore feasible model architectures to combine the advantages of a state-of-the-art PLM (i.e., ESM-1b) and a state-of-the-art protein structure encoder (i.e., GearNet). We empirically verify the ESM-GearNet that connects two encoders in a series way as the most effective combination model. To further improve the effectiveness of ESM-GearNet, we pre-train it on massive unlabeled protein structures with contrastive learning, which aligns representations of co-occurring subsequences so as to capture their biological correlation. Extensive experiments on EC and GO protein function prediction benchmarks demonstrate the superiority of ESM-GearNet over previous PLMs and structure encoders, and clear performance gains are further achieved by structure-based pre-training upon ESM-GearNet. The source code will be made public upon acceptance.

2023-03-06

ICLR.cc/2023/Workshop/MLDD (poster)

openreview.net

Enhancing Protein Language Model with Structure-based Encoder and Pre-training

Zuobai Zhang

Minghao Xu

Aurelie Lozano

Vijil Chenthamarakshan

Payel Das

Jian Tang

Protein language models (PLMs) pre-trained on large-scale protein sequence corpora have achieved impressive performance on various downstrea… (see more)m protein understanding tasks. Despite the ability to implicitly capture inter-residue contact information, transformer-based PLMs cannot encode protein structures explicitly for better structure-aware protein representations. Besides, the power of pre-training on available protein structures has not been explored for improving these PLMs, though structures are important to determine functions. To tackle these limitations, in this work, we enhance the PLM with structure-based encoder and pre-training. We first explore feasible model architectures to combine the advantages of a state-of-the-art PLM (i.e., ESM-1b) and a state-of-the-art protein structure encoder (i.e., GearNet). We empirically verify the ESM-GearNet that connects two encoders in a series way as the most effective combination model. To further improve the effectiveness of ESM-GearNet, we pre-train it on massive unlabeled protein structures with contrastive learning, which aligns representations of co-occurring subsequences so as to capture their biological correlation. Extensive experiments on EC and GO protein function prediction benchmarks demonstrate the superiority of ESM-GearNet over previous PLMs and structure encoders, and clear performance gains are further achieved by structure-based pre-training upon ESM-GearNet. The source code will be made public upon acceptance.

2023-03-06

ICLR.cc/2023/Workshop/MLDD (poster)

openreview.net

EurNet: Efficient Multi-Range Relational Modeling of Protein Structure

Minghao Xu

Yuanfan Guo

Yi Xu

Jian Tang

Xinlei Chen

Yuandong Tian

Modeling the 3D structures of proteins is critical for obtaining effective protein structure representations, which further boosts protein f… (see more)unction understanding. Existing protein structure encoders mainly focus on modeling short-range interactions within protein structures, while they neglect modeling the interactions at multiple length scales that are actually complete interactive patterns in protein structures. To attain complete interaction modeling with efficient computation, we introduce the EurNet for Efficient multi-range relational modeling. In EurNet, we represent the protein structure as a multi-relational residue-level graph with different types of edges for modeling short-range, medium-range and long-range interactions. To efficiently process these different interactive relations, we propose a novel modeling layer, called Gated Relational Message Passing (GRMP), as the basic building block of EurNet. GRMP can capture multiple interactive relations in protein structures with little extra computational cost. We verify the state-of-the-art performance of EurNet on EC and GO protein function prediction benchmarks, and the proposed GRMP layer is proved to achieve better efficiency-performance trade-off than the widely-used relational graph convolution.

2023-03-06

ICLR.cc/2023/Workshop/MLDD (poster)

openreview.net

Privacy-Preserving Fair Item Ranking

Jiajun Sun

Sikha Pentyala

Martine De Cock

Golnoosh Farnadi

Users worldwide access massive amounts of curated data in the form of rankings on a daily basis. The societal impact of this ease of access … (see more)has been studied and work has been done to propose and enforce various notions of fairness in rankings. Current computational methods for fair item ranking rely on disclosing user data to a centralized server, which gives rise to privacy concerns for the users. This work is the first to advance research at the conjunction of producer (item) fairness and consumer (user) privacy in rankings by exploring the incorporation of privacy-preserving techniques; specifically, differential privacy and secure multi-party computation. Our work extends the equity of amortized attention ranking mechanism to be privacy-preserving, and we evaluate its effects with respect to privacy, fairness, and ranking quality. Our results using real-world datasets show that we are able to effectively preserve the privacy of users and mitigate unfairness of items without making additional sacrifices to the quality of rankings in comparison to the ranking mechanism in the clear.

2023-03-06

ArXiv (preprint)

doi.org

arxiv.org

Relationship between prediction accuracy and feature importance reliability: An empirical and theoretical study

Jianzhong Chen

L.Q.R. Ooi

Trevor Wei Kiat Tan

Shaoshi Zhang

Jingwei Li

Christopher L. Asplund

Simon B. Eickhoff

Danilo Bzdok

Avram  j. Holmes

Bt Thomas Yeo

2023-03-06

NeuroImage (published)

doi.org

Relationship between prediction accuracy and feature importance reliability: An empirical and theoretical study

Jianzhong Chen

L.Q.R. Ooi

Leon Qi Rong Ooi

Trevor Wei Kiat Tan

Shaoshi Zhang

Jingwei Li

Christopher L. Asplund

Simon B. Eickhoff

Danilo Bzdok

Avram J. Holmes

B.T. Thomas Yeo

2023-03-06

NeuroImage (published)

doi.org

Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration

Xiangyu Zhao

Hannes Stärk

Dominique Beaini

Pietro Lio

Yiren Zhao

2023-03-06

ICLR.cc/2023/Workshop/MLDD (poster)

doi.org

openreview.net

Improved Robustness Against Adaptive Attacks With Ensembles and Error-Correcting Output Codes

Thomas Philippon

Christian Gagné

Neural network ensembles have been studied extensively in the context of adversarial robustness and most ensemble-based approaches remain vu… (see more)lnerable to adaptive attacks. In this paper, we investigate the robustness of Error-Correcting Output Codes (ECOC) ensembles through architectural improvements and ensemble diversity promotion. We perform a comprehensive robustness assessment against adaptive attacks and investigate the relationship between ensemble diversity and robustness. Our results demonstrate the benefits of ECOC ensembles for adversarial robustness compared to regular ensembles of convolutional neural networks (CNNs) and show why the robustness of previous implementations is limited. We also propose an adversarial training method specific to ECOC ensembles that allows to further improve robustness to adaptive attacks.

2023-03-04

ArXiv (preprint)

doi.org

arxiv.org

Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

Shashank Shekhar

Florian Bordes

Pascal Vincent

Ari S. Morcos

Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading pa… (see more)radigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of their representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network, primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that it re-organizes the information to be more similar to pre-trained joint embedding models.

2023-03-04

ICLR.cc/2023/Workshop/ME-FoMo (spotlight)

doi.org

openreview.net

Out-of-context Meta-learning in Large Language Models

Dmitrii Krasheninnikov

Egor Krasheninnikov

David Scott Krueger

Brown et al. (2020) famously introduced the phenomenon of in-context meta-learning in large language models (LLMs). Our work establishes the… (see more) existence of a phenomenon we call out-of-context meta-learning via carefully designed synthetic experiments with large language models. We argue that out-of-context meta-learning is an important and surprising capability of LLMs, which may lead them to more readily "internalize" the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and apply it in appropriate contexts. We also raise the question of how this phenomenon emerges, and discuss two possible explanations: one relying on the way LLMs store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based methods may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks.

2023-03-04

ICLR.cc/2023/Workshop/ME-FoMo (poster)

openreview.net

Robustifying Language Models with Test-Time Adaptation

Noah Thomas McDermott

Junfeng Yang

Chengzhi Mao

Large-scale language models achieved state-of-the-art performance over a number of language tasks. However, they fail on adversarial languag… (see more)e examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans. While prior work focuses on making the language model robust at training time, retraining for robustness is often unrealistic for large-scale foundation models. Instead, we propose to make the language models robust at test time. By dynamically adapting the input sentence with predictions from masked words, we show that we can reverse many language adversarial attacks. Since our approach does not require any training, it works for novel tasks at test time and can adapt to novel adversarial corruptions. Visualizations and empirical results on two popular sentence classification datasets demonstrate that our method can repair adversarial language attacks over 65% o

2023-03-04

ICLR.cc/2023/Workshop/Trustworthy_ML (poster)

doi.org

openreview.net

Identifying Different Student Clusters in Functional Programming Assignments: From Quick Learners to Struggling Students

Chuqin Geng

Wenwen Xu

Yingjie Xu

Brigitte Pientka

Xujie Si

Instructors and students alike are often focused on the grade in programming assignments as a key measure of how well a student is mastering… (see more) the material and whether a student is struggling. This can be, however, misleading. Especially when students have access to auto-graders, their grades may be heavily skewed. In this paper, we analyze student assignment submission data collected from a functional programming course taught at McGill university incorporating a wide range of features. In addition to the grade, we consider activity time data, time spent, and the number of static errors. This allows us to identify four clusters of students: "Quick-learning", "Hardworking", "Satisficing", and "Struggling" through cluster algorithms. We then analyze how work habits, working duration, the range of errors, and the ability to fix errors impact different clusters of students. This structured analysis provides valuable insights for instructors to actively help different types of students and emphasize different aspects of their overall course design. It also provides insights for students themselves to understand which aspects they still struggle with and allows them to seek clarification and adjust their work habits.

2023-03-03

Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (published)

doi.org

arxiv.org

NLP in the era of generative AI, cognitive sciences, and societal transformation

AI Policy Compass

Student Life and Resources

Publications

NLP in the era of generative AI, cognitive sciences, and societal transformation

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications