Publications

Learning on Large-scale Text-attributed Graphs via Variational Inference
Jianan Zhao
Meng Qu
Chaozhuo Li
Hao Yan
Qian Liu
Rui Li
Xing Xie
This paper studies learning on text-attributed graphs (TAGs), where each node is associated with a text description. An ideal solution for s… (voir plus)uch a problem would be integrating both the text and graph structure information with large language models and graph neural networks (GNNs). However, the problem becomes very challenging when graphs are large due to the high computational complexity brought by training large language models and GNNs together. In this paper, we propose an efficient and effective solution to learning on large text-attributed graphs by fusing graph structure and language learning with a variational Expectation-Maximization (EM) framework, called GLEM. Instead of simultaneously training large language models and GNNs on big graphs, GLEM proposes to alternatively update the two modules in the E-step and M-step. Such a procedure allows training the two modules separately while simultaneously allowing the two modules to interact and mutually enhance each other. Extensive experiments on multiple data sets demonstrate the efficiency and effectiveness of the proposed approach.
MeshDiffusion: Score-based Generative 3D Mesh Modeling
Zhen Liu
Yao Feng
Michael J. Black
Weiyang Liu
We consider the task of generating realistic 3D shapes, which is useful for a variety of applications such as automatic scene generation and… (voir plus) physical simulation. Compared to other 3D representations like voxels and point clouds, meshes are more desirable in practice, because (1) they enable easy and arbitrary manipulation of shapes for relighting and simulation, and (2) they can fully leverage the power of modern graphics pipelines which are mostly optimized for meshes. Previous scalable methods for generating meshes typically rely on sub-optimal post-processing, and they tend to produce overly-smooth or noisy surfaces without fine-grained geometric details. To overcome these shortcomings, we take advantage of the graph structure of meshes and use a simple yet very effective generative modeling method to generate 3D meshes. Specifically, we represent meshes with deformable tetrahedral grids, and then train a diffusion model on this direct parameterization. We demonstrate the effectiveness of our model on multiple generative tasks.
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
Shoaib Ahmed Siddiqui
Nitarshan Rajkumar
Sara Hooker
Modern machine learning research relies on relatively few carefully curated datasets. Even in these datasets, and typically in `untidy' or r… (voir plus)aw data, practitioners are faced with significant issues of data quality and diversity which can be prohibitively labor intensive to address. Existing methods for dealing with these challenges tend to make strong assumptions about the particular issues at play, and often require a priori knowledge or metadata such as domain labels. Our work is orthogonal to these methods: we instead focus on providing a unified and efficient framework for Metadata Archaeology -- uncovering and inferring metadata of examples in a dataset. We curate different subsets of data that might exist in a dataset (e.g. mislabeled, atypical, or out-of-distribution examples) using simple transformations, and leverage differences in learning dynamics between these probe suites to infer metadata of interest. Our method is on par with far more sophisticated mitigation methods across different tasks: identifying and correcting mislabeled examples, classifying minority-group samples, prioritizing points relevant for training and enabling scalable human auditing of relevant examples.
Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching
Shengchao Liu
Hongyu Guo
Molecular representation pretraining is critical in various applications for drug and material discovery due to the limited number of labele… (voir plus)d molecules, and most existing work focuses on pretraining on 2D molecular graphs. However, the power of pretraining on 3D geometric structures has been less explored. This is owing to the difficulty of finding a sufficient proxy task that can empower the pretraining to effectively extract essential features from the geometric structures. Motivated by the dynamic nature of 3D molecules, where the continuous motion of a molecule in the 3D Euclidean space forms a smooth potential energy surface, we propose GeoSSL, a 3D coordinate denoising pretraining framework to model such an energy landscape. Further by leveraging an SE(3)-invariant score matching method, we propose GeoSSL-DDM in which the coordinate denoising proxy task is effectively boiled down to denoising the pairwise atomic distances in a molecule. Our comprehensive experiments confirm the effectiveness and robustness of our proposed method.
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini
Sejun Park
Manuela Girotti
Murat A Erdogdu
We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input …
Organizing Principles of Astrocytic Nanoarchitecture in the Mouse Cerebral Cortex
Christopher K. Salmon
Tabish A Syed
J. Benjamin Kacerovsky
Nensi Alivodej
Alexandra L. Schober
Tyler F. W. Sloan
Michael T. Pratte
Michael P. Rosen
Miranda Green
Adario DasGupta
Hojatollah Vali
Craig A. Mandato
Shaurya Mehta
Affan Jilani
Keith K. Murai
Yanan Wang
Preclinical-to-clinical Anti-cancer Drug Response Prediction and Biomarker Identification Using TINDL
David Earl Hostallero
Lixuan Wei
Liewei Wang
Junmei Cairns
Protein Representation Learning by Geometric Structure Pretraining
Zuobai Zhang
Minghao Xu
Arian Rokkum Jamasb
Vijil Chenthamarakshan
Aurelie Lozano
Payel Das
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Ex… (voir plus)isting approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences and then finetune the models with some labeled data in downstream tasks. Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function. In this paper, we propose to pretrain protein representations according to their 3D structures. We first present a simple yet effective encoder to learn the geometric features of a protein. We pretrain the protein graph encoder by leveraging multiview contrastive learning and different self-prediction tasks. Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods, while using much less pretraining data. Our implementation is available at https://github.com/DeepGraphLearning/GearNet.
Protein Sequence and Structure Co-Design with Equivariant Translation
Chence Shi
Chuanrui Wang
Jiarui Lu
Bozitao Zhong
Proteins are macromolecules that perform essential functions in all living organisms. Designing novel proteins with specific structures and … (voir plus)desired functions has been a long-standing challenge in the field of bioengineering. Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models, both of which suffer from high inference costs. In this paper, we propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state from random initialization, based on context features given a priori. Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features, and a roto-translation equivariant decoder that translates protein sequence and structure interdependently. Notably, all protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process. Experimental results across multiple tasks show that our model outperforms previous state-of-the-art baselines by a large margin, and is able to design proteins of high fidelity as regards both sequence and structure, with running time orders of magnitude less than sampling-based methods.
P REDICTIVE I NFERENCE WITH F EATURE C ONFORMAL P REDICTION
Jiaye Teng
Chuan Wen
Dinghuai Zhang
Yang Gao
Yang Yuan
Reliability of CKA as a Similarity Measure in Deep Learning
MohammadReza Davari
Stefan Horoi
Amine Natik
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different w… (voir plus)ays. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of claims about similarity and dissimilarity of these various representations have been made using CKA results. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation to CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counterintuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.
Revisiting Populations in multi-agent Communication
Paul Michel
Mathieu Rita
Olivier Tieleman
Angeliki Lazaridou
Despite evidence from cognitive sciences that larger groups of speakers tend to develop more structured languages in human communication, sc… (voir plus)aling up to populations has failed to yield significant benefits in emergent multi-agent communication. In this paper we advocate for an alternate population-level training paradigm for referential games based on the idea of "partitioning" the agents into sender-receiver pairs and limiting co-adaptation across pairs. We show that this results in optimizing a different objective at the population level, where agents maximize (1) their respective "internal" communication accuracy and (2) some measure of alignment between agents. In experiments, we find that this leads to the emergence of languages that are significantly more compositional. Moreover, when agents are trained in populations that are not fully connected (ie. not all agent pairs interact at training time), this approach reduces multi-linguality and improves zero-shot communication with new agents (ie. agents are able to communicate successfully with other agents outside their training partners).