Publications

S5 Framework: A Review of Self-Supervised Shared Semantic Space Optimization for Multimodal Zero-Shot Learning
Clst
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Ja-740 cob
Angeliki Lapata
Jonathan Lazaridou
Alek-742 May
Nicolas sandr Nisnevich
P. PintoJoseph
Turian
Ting Chen
Simon Kornblith
Mohammad Norouzi
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El … (see 89 more)
Faisal Kholy
Zhe Ahmed
Yu Gan
Cheng
Zihan Dai
Hanxiao Liu
Quoc V. Le
Jia Deng
Wei Dong
Richard Socher
Li-Jia Li
K. Liu
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Jesse Dodge
Maarten Sap
Ana Marasovic
Gabriel Agnew
Dirk Ilharco
Groeneveld Matt
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Yang Liu
Jianfeng Wang
Ming Gao
Zhou
Xiaoyi Dong
Jia Bao
Ting Zhang
Dongdong
Weiming Chen
Lu Zhang
Dong Yuan
Fang Chen
Da-cheng Juan
Chuntian Lu
Zhen Li
Futang Peng
Aleksei Timofeev
Yi-Ting Chen
Yaxi Gao
Tom
Andrew Duerig
Tomkins Sujith
Ravi
Lukasz Kaiser
Aidan N. Gomez
Noam M. Shazeer
Niki Vaswani
Llion Parmar
Jones Jakob
Uszko-850
Alex G. Kendall
Yarin Gal
Roberto Cipolla
Salman H. Khan
Muzammal Naseer
Munawar Hayat
Waqas Zamir
Fahad Shahbaz
Khan
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin John-867
Kenji Hata
Joshua Kravitz
Stephanie Chen
Mike Lewis
Yinhan Liu
Marjan Naman Goyal
Abdelrahman Ghazvininejad
Omer Mohamed
Levy
Luke Zettlemoyer
Bohan Li
Hao Zhou
Jun-Tao He
Mingxuan Wang
Liunian Harold
Mark Li
Da Yatskar
Yin
Cho-Jui
Kai-Wei Chang
Visualbert
In this review, we aim to inspire research into 001 S elf-S upervised S hared S emantic S pace ( S5 ) 002 multimodal learning problems. We e… (see more)quip non-003 expert researchers with a framework of in-004 formed modeling decisions via an extensive 005 literature review, an actionable modeling check-006 list, as well as a series of novel zero-shot eval-007 uation tasks. The core idea for our S5 check-008 list lies in learning contextual multimodal in-009 teractions at various granularity levels via a 010 shared Transformer encoder with a denoising 011 loss term, which is also regularized by a con-012 trastive loss term to induce a semantic align-013 ment prior on the contextual embedding space. 014 Essentially, we aim to model human concept 015 understanding and thus learn to “put a name to 016 a face”. This ultimately enables interpretable 017 zero-shot S5 generalization on a variety of 018 novel downstream tasks. In summary, this re-019 view provides sufficient background and ac-020 tionable strategies for training cutting-edge S5 021 multimodal networks. 022
FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning
Songtao Liu
Zhengkai Tu
Lu Lin
Rex Ying
Peilin Zhao
Dinghao Wu
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategie… (see more)s use a decoupled approach of single-step retrosynthesis models and search algorithms, taking only the product as the input to predict the reactants for each planning step and ignoring valuable context information along the synthetic route. In this work, we propose a novel framework that utilizes context information for improved retrosynthetic planning. We view synthetic routes as reaction graphs and propose to incorporate context through three principled steps: encode molecules into embeddings, aggregate information over routes, and readout to predict reactants. Our approach is the first attempt to utilize in-context learning for retrosynthesis prediction in retrosynthetic planning. The entire framework can be efficiently optimized in an end-to-end fashion and produce more practical and accurate predictions. Comprehensive experiments demonstrate that by fusing in the context information over routes, our model significantly improves the performance of retrosynthetic planning over baselines that are not context-aware, especially for long synthetic routes. Code is available at https://github.com/SongtaoLiu0823/FusionRetro.
GANSpiration: Balancing Targeted and Serendipitous Inspiration in User Interface Design with Style-Based Generative Adversarial Network
Mohammad Amin Mozaffari
Xinyuan Zhang
Jinghui Cheng
Jin L.C. Guo
Inspiration from design examples plays a crucial role in the creative process of user interface design. However, current tools and technique… (see more)s that support inspiration usually only focus on example browsing with limited user control or similarity-based example retrieval, leading to undesirable design outcomes such as focus drift and design fixation. To address these issues, we propose the GANSpiration approach that suggests design examples for both targeted and serendipitous inspiration, leveraging a style-based Generative Adversarial Network. A quantitative evaluation revealed that the outputs of GANSpiration-based example suggestion approaches are relevant to the input design, and at the same time include diverse instances. A user study with professional UI/UX practitioners showed that the examples suggested by our approach serve as viable sources of inspiration for overall design concepts and specific design elements. Overall, our work paves the road of using advanced generative machine learning techniques in supporting the creative design practice.
A general class of surrogate functions for stable and efficient reinforcement learning
Olivier Bachem
Robert Müller
Shivam Garg
Matthieu Geist
Marlos C. Machado
Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions… (see more) have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. Crucially, these guarantees hold regardless of the choice of policy parameterization. Moreover, a particular instantiation of FMA-PG recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) resulting in a variant of TRPO with additional desirable properties. Via experiments on simple bandit problems, we evaluate the algorithms instantiated by FMA-PG. The proposed framework also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate on the MuJoCo suite.
Generating physically-consistent high-resolution climate data with hard-constrained neural networks
Prasanna Sattegeri
Campbell Watson
D. Szwarcman
The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and … (see more)mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and therefore often can only make coarse resolution predictions. Statistical downscaling can provide an efficient method of upsampling low-resolution data. In this field, deep learning has been applied successfully, often us-ing image super-resolution methods from computer vision. Despite achieving visually compelling results in some cases, such models often violate conservation laws when predicting physical variables. In order to conserve important physical quantities, we develop methods that guarantee physical constraints are satisfied by a deep downscaling model while also increasing their performance according to traditional metrics. We introduce two ways of constraining the network: A renor-malization layer added to the end of the neural network and a successive approach that scales with increasing upsampling factors. We show the applicability of our methods across different popular architectures and upsampling factors using ERA5 reanalysis data.
GitHub repositories with links to academic papers: Public access, traceability, and evolution
Supatsara Wattanakriengkrai
Bodin Chinthanet
Hideaki Hata
Raula Gaikovina Kula
Christoph Treude
Jin L.C. Guo
Kenichi Matsumoto
Goal-driven optimization of single-neuron properties in artificial networks reveals regularization role of neural diversity and adaptation in the brain
Neurons in the brain have rich and adaptive input-output properties. Features such as diverse f-I curves and spike frequency adaptation are … (see more)known to place single neurons in optimal coding regimes when facing changing stimuli. Yet, it is still unclear how brain circuits exploit single neuron flexibility, and how network-level requirements may have shaped such cellular function. To answer this question, a multi-scaled approach is needed where the computations of single neurons and of neural circuits must be considered as a complete system. In this work, we use artificial neural networks to systematically investigate single neuron input-output adaptive mechanisms, optimized in an end-to-end fashion. Throughout the optimization process, each neuron has the liberty to modify its nonlinear activation function, parametrized to mimic f-I curves of biological neurons, and to learn adaptation strategies to modify activation functions in real-time during a task. We find that such networks show much-improved robustness to noise and changes in input statistics. Importantly, we find that this procedure recovers precise coding strategies found in biological neurons, such as gain scaling and fractional order differentiation/integration. Using tools from dynamical systems theory, we analyze the role of these emergent single neuron properties and argue that neural diversity and adaptation plays an active regularization role that enables neural circuits to optimally propagate information across time.
Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound
The study of first-order optimization is sensitive to the assumptions made on the objective functions. These assumptions induce complexity c… (see more)lasses which play a key role in worst-case analysis, including the fundamental concept of algorithm optimality. Recent work argues that strong convexity and smoothness, popular assumptions in literature, lead to a pathological definition of the condition number (Guille-Escuret et al., 2021). Motivated by this result, we focus on the class of functions satisfying a lower restricted secant inequality and an upper error bound. On top of being robust to the aforementioned pathological behavior and including some non-convex functions, this pair of conditions displays interesting geometrical properties. In particular, the necessary and sufficient conditions to interpolate a set of points and their gradients within the class can be separated into simple conditions on each sampled gradient. This allows the performance estimation problem (PEP, Drori and Teboulle (2012)) to be solved analytically, leading to a lower bound on the convergence rate that proves gradient descent to be exactly optimal on this class of functions among all first-order algorithms.
Graph-Based Active Machine Learning Method for Diverse and Novel Antimicrobial Peptides Generation and Selection
Bonaventure F. P. Dossou
Dianbo Liu
Almer M. van der Sloot
Roger Palou
Michael Tyers
As antibiotic-resistant bacterial strains are rapidly spreading worldwide, infections caused by these strains are emerging as a global crisi… (see more)s causing the death of millions of people every year. Antimicrobial Peptides (AMPs) are one of the candidates to tackle this problem because of their potential diversity, and ability to favorably modulate the host immune response. However, large-scale screening of new AMP candidates is expensive, time-consuming, and now affordable in developing countries, which need the treatments the most. In this work, we propose a novel active machine learning-based framework that statistically minimizes the number of wet-lab experiments needed to design new AMPs, while ensuring a high diversity and novelty of generated AMPs sequences, in multi-rounds of wet-lab AMP screening settings. Combining recurrent neural network models and a graph-based filter (GraphCC), our proposed approach delivers novel and diverse candidates and demonstrates better performances according to our defined metrics.
GrowSpace: Learning How to Shape Plants
Plants are dynamic systems that are integral to our existence and survival. Plants face environment changes and adapt over time to their sur… (see more)rounding conditions. We argue that plant responses to an environmental stimulus are a good example of a real-world problem that can be approached within a reinforcement learning (RL)framework. With the objective of controlling a plant by moving the light source, we propose GrowSpace, as a new RL benchmark. The back-end of the simulator is implemented using the Space Colonisation Algorithm, a plant growing model based on competition for space. Compared to video game RL environments, this simulator addresses a real-world problem and serves as a test bed to visualize plant growth and movement in a faster way than physical experiments. GrowSpace is composed of a suite of challenges that tackle several problems such as control, multi-stage learning,fairness and multi-objective learning. We provide agent baselines alongside case studies to demonstrate the difficulty of the proposed benchmark.
Harvesting Mature Relation Extraction Models from Limited Seed Knowledge: A Self-Development Framework for DS Rule Expansion
Raphael Hoffmann
Congle Zhang
Xiao Ling
Yankai Lin
Shiqi Shen
Zhiyuan Liu
Huanbo Luan
Christopher D Manning
M. Surdeanu
John Bauer
Adriana Romero
Pietro Lio’
Xuanhui Wang
Cheng Li
Nadav Golbandi
Bendersky Marc
Najork. 2018
The
Wentao Wu … (see 2 more)
Hongsong Li
Haixun Wang
Distantly-supervised relation extraction 001 (DSRE) is an effective method to scale relation 002 extraction (RE) to large unlabeled corpora … (see more)003 with the utilization of knowledge bases (KBs), 004 but suffers from the scale of KBs and the 005 introduced noise. 006 To alleviate the above two problems, we 007 propose a novel framework called S elf-008 devel O pment r U le ex P ansion ( SOUP ), which 009 starts from limited amount of labeled data 010 and continuously produces low-noise labels on 011 large-scaled unlabeled data by a growing learn-012 able logical rules set. 013 Specifically, SOUP achieves a mutual enhance-014 ment of RE model and logical rules set, first 015 a RE model is trained on the labeled data to 016 summarize the knowledge, then the knowledge 017 is utilized to explore candidate rules from unla-018 beled data, finally high-quality candidates are 019 selected in a graph-based ranking manner to ex-020 tend the logical rules set and new rule-labeled 021 data are provided for better RE model training. 022 Experiments on wiki20 dataset demonstrate 023 that, with limited seed knowledge from small-024 scaled manually labeled data, SOUP achieves 025 significant improvement compared to baselines 026 by producing continuous growth of both logical 027 rules and the RE model, and that labeling noise 028 of SOUP is much less than DS. Furthermore, 029 RE model enhanced by SOUP with 1.6k logical 030 rules learned from prior knowledge could pro-031 duce an equivalent performance to the model 032 trained on data labeled in DS manner by 72k 033 relational facts of KBs. 034
Heterogeneous Supervised Topic Models
Hal Daumé III
David Blei
Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to bo… (see more)th uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.