Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer
Damjan Kalajdzievski
Ximeng Mao
Pascal Fortier-Poisson
When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream… (see more)) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.
Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Boris Knyazev
DOHA HWANG
Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communi… (see more)ties with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.
Enhancing Protein Language Model with Structure-based Encoder and Pre-training
Zuobai Zhang
Minghao Xu
Aurelie Lozano
Vijil Chenthamarakshan
Payel Das
Protein language models (PLMs) pre-trained on large-scale protein sequence corpora have achieved impressive performance on various downstrea… (see more)m protein understanding tasks. Despite the ability to implicitly capture inter-residue contact information, transformer-based PLMs cannot encode protein structures explicitly for better structure-aware protein representations. Besides, the power of pre-training on available protein structures has not been explored for improving these PLMs, though structures are important to determine functions. To tackle these limitations, in this work, we enhance the PLM with structure-based encoder and pre-training. We first explore feasible model architectures to combine the advantages of a state-of-the-art PLM (i.e., ESM-1b) and a state-of-the-art protein structure encoder (i.e., GearNet). We empirically verify the ESM-GearNet that connects two encoders in a series way as the most effective combination model. To further improve the effectiveness of ESM-GearNet, we pre-train it on massive unlabeled protein structures with contrastive learning, which aligns representations of co-occurring subsequences so as to capture their biological correlation. Extensive experiments on EC and GO protein function prediction benchmarks demonstrate the superiority of ESM-GearNet over previous PLMs and structure encoders, and clear performance gains are further achieved by structure-based pre-training upon ESM-GearNet. The source code will be made public upon acceptance.
Enhancing Protein Language Model with Structure-based Encoder and Pre-training
Zuobai Zhang
Minghao Xu
Aurelie Lozano
Vijil Chenthamarakshan
Payel Das
Protein language models (PLMs) pre-trained on large-scale protein sequence corpora have achieved impressive performance on various downstrea… (see more)m protein understanding tasks. Despite the ability to implicitly capture inter-residue contact information, transformer-based PLMs cannot encode protein structures explicitly for better structure-aware protein representations. Besides, the power of pre-training on available protein structures has not been explored for improving these PLMs, though structures are important to determine functions. To tackle these limitations, in this work, we enhance the PLM with structure-based encoder and pre-training. We first explore feasible model architectures to combine the advantages of a state-of-the-art PLM (i.e., ESM-1b) and a state-of-the-art protein structure encoder (i.e., GearNet). We empirically verify the ESM-GearNet that connects two encoders in a series way as the most effective combination model. To further improve the effectiveness of ESM-GearNet, we pre-train it on massive unlabeled protein structures with contrastive learning, which aligns representations of co-occurring subsequences so as to capture their biological correlation. Extensive experiments on EC and GO protein function prediction benchmarks demonstrate the superiority of ESM-GearNet over previous PLMs and structure encoders, and clear performance gains are further achieved by structure-based pre-training upon ESM-GearNet. The source code will be made public upon acceptance.
EurNet: Efficient Multi-Range Relational Modeling of Protein Structure
Minghao Xu
Yuanfan Guo
Yi Xu
Xinlei Chen
Yuandong Tian
Modeling the 3D structures of proteins is critical for obtaining effective protein structure representations, which further boosts protein f… (see more)unction understanding. Existing protein structure encoders mainly focus on modeling short-range interactions within protein structures, while they neglect modeling the interactions at multiple length scales that are actually complete interactive patterns in protein structures. To attain complete interaction modeling with efficient computation, we introduce the EurNet for Efficient multi-range relational modeling. In EurNet, we represent the protein structure as a multi-relational residue-level graph with different types of edges for modeling short-range, medium-range and long-range interactions. To efficiently process these different interactive relations, we propose a novel modeling layer, called Gated Relational Message Passing (GRMP), as the basic building block of EurNet. GRMP can capture multiple interactive relations in protein structures with little extra computational cost. We verify the state-of-the-art performance of EurNet on EC and GO protein function prediction benchmarks, and the proposed GRMP layer is proved to achieve better efficiency-performance trade-off than the widely-used relational graph convolution.
Learning Multi-Objective Curricula for Robotic Policy Learning
Jikun Kang
Miao Liu
Abhinav Gupta
Jie Fu
Privacy-Preserving Fair Item Ranking
Jiajun Sun
Sikha Pentyala
Martine De Cock
Users worldwide access massive amounts of curated data in the form of rankings on a daily basis. The societal impact of this ease of access … (see more)has been studied and work has been done to propose and enforce various notions of fairness in rankings. Current computational methods for fair item ranking rely on disclosing user data to a centralized server, which gives rise to privacy concerns for the users. This work is the first to advance research at the conjunction of producer (item) fairness and consumer (user) privacy in rankings by exploring the incorporation of privacy-preserving techniques; specifically, differential privacy and secure multi-party computation. Our work extends the equity of amortized attention ranking mechanism to be privacy-preserving, and we evaluate its effects with respect to privacy, fairness, and ranking quality. Our results using real-world datasets show that we are able to effectively preserve the privacy of users and mitigate unfairness of items without making additional sacrifices to the quality of rankings in comparison to the ranking mechanism in the clear.
Relationship between prediction accuracy and feature importance reliability: An empirical and theoretical study
Jianzhong Chen
L.Q.R. Ooi
Leon Qi Rong Ooi
Trevor Wei Kiat Tan
Shaoshi Zhang
Jingwei Li
Christopher L. Asplund
Simon B. Eickhoff
Avram J. Holmes
B.T. Thomas Yeo
Relationship between prediction accuracy and feature importance reliability: An empirical and theoretical study
Jianzhong Chen
L.Q.R. Ooi
Trevor Wei Kiat Tan
Shaoshi Zhang
Jingwei Li
Christopher L. Asplund
Simon B. Eickhoff
Danilo Bzdok
Avram  j. Holmes
Bt Thomas Yeo
Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration
Xiangyu Zhao
Hannes Stärk
Pietro Lio
Yiren Zhao
Improved Robustness Against Adaptive Attacks With Ensembles and Error-Correcting Output Codes
Thomas Philippon
Neural network ensembles have been studied extensively in the context of adversarial robustness and most ensemble-based approaches remain vu… (see more)lnerable to adaptive attacks. In this paper, we investigate the robustness of Error-Correcting Output Codes (ECOC) ensembles through architectural improvements and ensemble diversity promotion. We perform a comprehensive robustness assessment against adaptive attacks and investigate the relationship between ensemble diversity and robustness. Our results demonstrate the benefits of ECOC ensembles for adversarial robustness compared to regular ensembles of convolutional neural networks (CNNs) and show why the robustness of previous implementations is limited. We also propose an adversarial training method specific to ECOC ensembles that allows to further improve robustness to adaptive attacks.
Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations
Shashank Shekhar
Florian Bordes
Ari S. Morcos
Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading pa… (see more)radigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of their representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network, primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that it re-organizes the information to be more similar to pre-trained joint embedding models.