Publications

Learning with Rejection for Abstractive Text Summarization

Meng Cao

Yue Dong

Jingyi He

Jackie Cheung

2022-01-01

EMNLP (published)

doi.org

arxiv.org

Long Range Graph Benchmark

Vijay Prakash Dwivedi

Anh Tuan Luu

Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to b… (see more)uild node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of Transformer-based methods for graphs that can consider full node connectivity beyond the original sparse structure, thus enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop message passing often fare better in several existing graph benchmarks when combined with positional feature representations, among other innovations, hence limiting the perceived utility and ranking of Transformer-like architectures. Here, we present the Long Range Graph Benchmark (LRGB) with 5 graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and Peptides-struct that arguably require LRI reasoning to achieve strong performance in a given task. We benchmark both baseline GNNs and Graph Transformer networks to verify that the models which capture long-range dependencies perform significantly better on these tasks. Therefore, these datasets are suitable for benchmarking and exploration of MP-GNNs and Graph Transformer architectures that are intended to capture LRI.

openreview.net

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Vikram Voleti

Alexia Jolicoeur-Martineau

Chris Pal

Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor … (see more)and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically: future/past prediction -- when only future/past frames are masked; unconditional generation -- when both past and future frames are masked; and interpolation -- when neither past nor future frames are masked. Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our MCVD models are built from simple non-recurrent 2D-convolutional architectures, conditioning on blocks of frames and generating blocks of frames. We generate videos of arbitrary lengths autoregressively in a block-wise manner. Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using

openreview.net

Metro: Memory-Enhanced Transformer for Retrosynthetic Planning via Reaction Tree

Songtao Liu

Zhitao Ying

Rex Ying

Zuobai Zhang

Peilin Zhao

Jian Tang

Lu Lin

Dinghao Wu

Retrosynthetic planning plays a critical role in drug discovery and organic chemistry. Starting from a target molecule as the root node, it … (see more)aims to find a complete reaction tree subject to the constraint that all leaf nodes belong to a set of starting materials. The multi-step reactions are crucial because they determine the flow chart in the production of the Organic Chemical Industry. However, existing datasets lack curation of tree-structured multi-step reactions, and fail to provide such reaction trees, limiting models’ understanding of organic molecule transformations. In this work, we first develop a benchmark curated for the retrosynthetic planning task, which consists of 124,869 reaction trees retrieved from the public USPTO-full dataset. On top of that, we propose Metro: Memory-Enhanced Transformer for RetrOsynthetic planning. Specifically, the dependency among molecules in the reaction tree is captured as context information for multi-step retrosynthesis predictions through transformers with a memory module. Extensive experiments show that Metro dramatically outperforms existing single-step retrosynthesis models by at least 10.7% in top-1 accuracy. The experiments demonstrate the superiority of exploiting context information in the retrosynthetic planning task. Moreover, the proposed model can be directly used for synthetic accessibility analysis, as it is trained on reaction trees with the shortest depths. Our work is the first step towards a brand new formulation for retrosynthetic planning in the aspects of data construction, model design, and evaluation. Code is available at https://github.com/SongtaoLiu0823/metro.

2022-01-01

arXiv.org (preprint)

doi.org

openreview.net

Is a Modular Architecture Enough?

Sarthak Mittal

Yoshua Bengio

Guillaume Lajoie

Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent… (see more) work demonstrates that not only do some modular architectures generalize well, but they also lead to better out of distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparse modular connections, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.

openreview.net

Multi-Head Adapter Routing for Data-Efficient Fine-Tuning

Lucas Caccia

Edoardo Ponti

Li Li

Matheus Pereira

Nicolas Le Roux

Alessandro Sordoni

Parameter-efﬁcient ﬁne-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly add… (see more)ed parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon [Ponti et al., 2022] jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and ﬁne-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efﬁcient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation ( Poly - µ ) instead of learning a routing function. Moreover, we introduce more expressive variants where ﬁner-grained task–adapter allocation is learned through a multi-head routing function ( Poly - S ). We test these variants on three separate benchmarks for multi-task learning. We ﬁnd that Poly - S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we ﬁnd that instruction tuning, where models are fully ﬁne-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.

2022-01-01

arXiv.org (preprint)

doi.org

Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages

Jesujoba Oluwadara Alabi

David Ifeoluwa Adelani

Marius Mosbach

Dietrich Klakow

and XLM-R) and three NLP tasks (NER, news topic classiﬁcation, and sentiment classiﬁcation) shows that our approach is competitive to ap… (see more)plying LAFT on individual languages while requiring signiﬁcantly less disk space. Finally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efﬁcient ﬁne-tuning methods.

2022-01-01

arXiv.org (preprint)

doi.org

Neural Attentive Circuits

Nasim Rahaman

Martin Weiss

Francesco Locatello

Chris Pal

Yoshua Bengio

Bernhard Schölkopf

Li Erran Li

Nicolas Ballas

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modali… (see more)ties. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.

openreview.net

Optimizing deep learning for Magnetoencephalography (MEG): From sensory perception to sex prediction and brain fingerprinting

Arthur Dehgan

Irina Rish

Karim Jerbi

2022-01-01

2022 Conference on Cognitive Computational Neuroscience (published)

doi.org

Orientation and Context Entangled Network for Retinal Vessel Segmentation

Xinxu Wei

Kaifu Yang

Danilo Bzdok

Yongming Li

2022-01-01

SSRN Electronic Journal (published)

doi.org

arxiv.org

Overcoming challenges in leveraging GANs for few-shot data augmentation

Christopher Beckham

Issam Hadj Laradji

Pau Rodriguez

David Vázquez

Derek Nowrouzezahrai

Chris Pal

2022-01-01

CoLLAs (published)

proceedings.mlr.press

arxiv.org

PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding

Yang Zhang

We are now witnessing significant progress of deep learning methods in a variety of tasks (or datasets) of proteins. However, there is a lac… (see more)k of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field. In this paper, we propose such a benchmark called PEER, a comprehensive and multi-task benchmark for Protein sEquence undERstanding. PEER provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction, and protein-ligand interaction prediction. We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models. In addition, we also investigate the performance of these methods under the multi-task learning setting. Experimental results show that large-scale pre-trained protein language models achieve the best performance for most individual tasks, and jointly training multiple tasks further boosts the performance. The datasets and source codes of this benchmark are all available at https://github.com/DeepGraphLearning/PEER_Benchmark

openreview.net

Speed Science

Leading in a New Era

Supervision Requests

Publications

Speed Science

Leading in a New Era

Supervision Requests

Popular keywords:

Publications