I NTRODUCING C OORDINATION IN C ONCURRENT R EIN - FORCEMENT L EARNING
Adrien Ali Taiga
Google Brain
Research on exploration in reinforcement learning has mostly focused on problems with a single agent interacting with an environment. Howeve… (see more)r many problems are better addressed by the concurrent reinforcement learning paradigm, where multiple agents operate in a common environment. Recent work has tackled the challenge of exploration in this particular setting (Dimakopoulou & Van Roy, 2018; Dimakopoulou et al., 2018). Nonetheless, they do not completely leverage the characteristics of this framework and agents end up behaving independently from each other. In this work we argue that coordination among concurrent agents is crucial for efficient exploration. We introduce coordination in Thompson Sampling based methods by drawing correlated samples from an agent’s posterior. We apply this idea to extend existing exploration schemes such as randomized least squares value iteration (RLSVI). Empirical results on simple toy tasks emphasize the merits of our approach and call attention to coordination as a key objective for efficient exploration in concurrent reinforcement learning.
Offline Retrieval Evaluation Without Evaluation Metrics
Andres Ferraro
Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scala… (see more)r metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare the performance of multiple systems for the same request. Although evaluation metrics provide a convenient summary of system performance, they also collapse subtle differences across users into a single number and can carry assumptions about user behavior and utility not supported across retrieval scenarios. We propose recall-paired preference (RPP), a metric-free evaluation method based on directly computing a preference between ranked lists. RPP simulates multiple user subpopulations per query and compares systems across these pseudo-populations. Our results across multiple search and recommendation tasks demonstrate that RPP substantially improves discriminative power while correlating well with existing metrics and being equally robust to incomplete data.
QEN: Applicable Taxonomy Completion via Evaluating Full Taxonomic Relations
Suyuchen Wang
Ruihui Zhao
Yefeng Zheng
Taxonomy is a fundamental type of knowledge graph for a wide range of web applications like searching and recommendation systems. To keep a … (see more)taxonomy automatically updated with the latest concepts, the taxonomy completion task matches a pair of proper hypernym and hyponym in the original taxonomy with the new concept as its parent and child. Previous solutions utilize term embeddings as input and only evaluate the parent-child relations between the new concept and the hypernym-hyponym pair. Such methods ignore the important sibling relations, and are not applicable in reality since term embeddings are not available for the latest concepts. They also suffer from the relational noise of the “pseudo-leaf” node, which is a null node acting as a node’s hyponym to enable the new concept to be a leaf node. To tackle the above drawbacks, we propose the Quadruple Evaluation Network (QEN), a novel taxonomy completion framework that utilizes easily accessible term descriptions as input, and applies pretrained language model and code attention for accurate inference while reducing online computation. QEN evaluates both parent-child and sibling relations to both enhance the accuracy and reduce the noise brought by pseudo-leaf. Extensive experiments on three real-world datasets in different domains with different sizes and term description sources prove the effectiveness and robustness of QEN on overall performance and especially the performance for adding non-leaf nodes, which largely surpasses previous methods and achieves the new state-of-the-art of the task.1
QEN: Applicable Taxonomy Completion via Evaluating Full Taxonomic Relations
Suyuchen Wang
Ruihui Zhao
Yefeng Zheng
Taxonomy is a fundamental type of knowledge graph for a wide range of web applications like searching and recommendation systems. To keep a … (see more)taxonomy automatically updated with the latest concepts, the taxonomy completion task matches a pair of proper hypernym and hyponym in the original taxonomy with the new concept as its parent and child. Previous solutions utilize term embeddings as input and only evaluate the parent-child relations between the new concept and the hypernym-hyponym pair. Such methods ignore the important sibling relations, and are not applicable in reality since term embeddings are not available for the latest concepts. They also suffer from the relational noise of the “pseudo-leaf” node, which is a null node acting as a node’s hyponym to enable the new concept to be a leaf node. To tackle the above drawbacks, we propose the Quadruple Evaluation Network (QEN), a novel taxonomy completion framework that utilizes easily accessible term descriptions as input, and applies pretrained language model and code attention for accurate inference while reducing online computation. QEN evaluates both parent-child and sibling relations to both enhance the accuracy and reduce the noise brought by pseudo-leaf. Extensive experiments on three real-world datasets in different domains with different sizes and term description sources prove the effectiveness and robustness of QEN on overall performance and especially the performance for adding non-leaf nodes, which largely surpasses previous methods and achieves the new state-of-the-art of the task.1
Rare CNVs and phenome-wide profiling: a tale of brain-structural divergence and phenotypical convergence
J. Kopal
Kuldeep Kumar
Karin Saltoun
Claudia Modenato
Clara A. Moreau
Sandra Martin-Brevet
Guillaume Huguet
Martineau Jean-Louis
C.O. Martin
Zohra Saci
Nadine Younis
Petra Tamer
Elise Douard
Anne M. Maillard
Borja Rodriguez-Herreros
Aurélie Pain
Sonia Richetin
Leila Kushan
Ana I. Silva
Marianne B.M. van den Bree … (see 12 more)
David E.J. Linden
M. J. Owen
Jeremy Hall
Sarah Lippé
Bogdan Draganski
Ida E. Sønderby
Ole A. Andreassen
David C. Glahn
Paul M. Thompson
Carrie E. Bearden
Sébastien Jacquemont
Copy number variations (CNVs) are rare genomic deletions and duplications that can exert profound effects on brain and behavior. Previous re… (see more)ports of pleiotropy in CNVs imply that they converge on shared mechanisms at some level of pathway cascades, from genes to large-scale neural circuits to the phenome. However, studies to date have primarily examined single CNV loci in small clinical cohorts. It remains unknown how distinct CNVs escalate the risk for the same developmental and psychiatric disorders. Here, we quantitatively dissect the impact on brain organization and behavioral differentiation across eight key CNVs. In 534 clinical CNV carriers from multiple sites, we explored CNV-specific brain morphology patterns. We extensively annotated these CNV-associated patterns with deep phenotyping assays through the UK Biobank resource. Although the eight CNVs cause disparate brain changes, they are tied to similar phenotypic profiles across ∼1000 lifestyle indicators. Our population-level investigation established brain structural divergences and phenotypical convergences of CNVs, with direct relevance to major brain disorders.
Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study
Jianzhong Chen
Angela Tam
Valeria Kebets
Csaba Orban
L.Q.R. Ooi
Christopher L Asplund
Scott A. Marek
N. Dosenbach
Simon B. Eickhoff
Avram J. Holmes
B.T. Thomas Yeo
Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study
Jianzhong Chen
Angela Tam
Valeria Kebets
Csaba Orban
L.Q.R. Ooi
Leon Qi Rong Ooi
Christopher L. Asplund
Scott Marek
Nico Dosenbach
Simon B. Eickhoff
Avram J. Holmes
B.T. Thomas Yeo
Staged independent learning: Towards decentralized cooperative multi-agent Reinforcement Learning
Hadi Nekoei
Akilesh Badrinaaraayanan
Amit Sinha
Mohammad Amini
Janarthanan Rajendran
We empirically show that classic ideas from two-time scale stochastic approximation \citep{borkar1997stochastic} can be combined with sequen… (see more)tial iterative best response (SIBR) to solve complex cooperative multi-agent reinforcement learning (MARL) problems. We first start with giving a multi-agent estimation problem as a motivating example where SIBR converges while parallel iterative best response (PIBR) does not. Then we present a general implementation of staged multi-agent RL algorithms based on SIBR and multi-time scale stochastic approximation, and show that our new methods which we call Staged Independent Proximal Policy Optimization (SIPPO) and Staged Independent Q-learning (SIQL) outperform state-of-the-art independent learning on almost all the tasks in the epymarl \citep{papoudakis2020benchmarking} benchmark. This can be seen as a first step towards more decentralized MARL methods based on SIBR and multi-time scale learning.
VisPaD: Visualization and Pattern Discovery for Fighting Human Trafficking
Pratheeksha Nair
Yifei Li
Catalina Vajiac
Andreas Olligschlaeger
Meng-Chieh Lee
Namyong Park
Duen Horng Chau
Christos Faloutsos
Chieh Lee
VisPaD: Visualization and Pattern Discovery for Fighting Human Trafficking
Pratheeksha Nair
Yifei Li
Catalina Vajiac
Andreas Olligschlaeger
Meng-Chieh Lee
Namyong Park
Duen Horng Chau
Christos Faloutsos
Chieh Lee
Human trafficking analysts investigate groups of related online escort advertisements (called micro-clusters) to detect suspicious activitie… (see more)s and identify various modus operandi. This task is complex as it requires finding patterns and linked meta-data across micro-clusters such as the geographical spread of ads, cluster sizes, etc. Additionally, drawing insights from the data is challenging without visualizing these micro-clusters. To address this, in close-collaboration with domain experts, we built VisPaD, a novel interactive way for characterizing and visualizing micro-clusters and their associated meta-data, all in one place. VisPaD helps discover underlying patterns in the data by projecting micro-clusters in a lower dimensional space. It also allows the user to select micro-clusters involved in suspicious patterns and interactively examine them leading to faster detection and identification of trends in the data. A demo of VisPaD is also released1.
RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software
Cheng-Hao Liu
Maksym Korablyov
Stanisław Jastrzębski
Paweł Włodarczyk-Pruszyński
Marwin Segler
Learning how to Interact with a Complex Interface using Hierarchical Reinforcement Learning
Gheorghe Comanici
Amelia Glaese
Anita Gergely
Daniel Toyama
Zafarali Ahmed
Tyler Jackson
Philippe Hamel
Hierarchical Reinforcement Learning (HRL) allows interactive agents to decompose complex problems into a hierarchy of sub-tasks. Higher-leve… (see more)l tasks can invoke the solutions of lower-level tasks as if they were primitive actions. In this work, we study the utility of hierarchical decompositions for learning an appropriate way to interact with a complex interface. Specifically, we train HRL agents that can interface with applications in a simulated Android device. We introduce a Hierarchical Distributed Deep Reinforcement Learning architecture that learns (1) subtasks corresponding to simple finger gestures, and (2) how to combine these gestures to solve several Android tasks. Our approach relies on goal conditioning and can be used more generally to convert any base RL agent into an HRL agent. We use the AndroidEnv environment to evaluate our approach. For the experiments, the HRL agent uses a distributed version of the popular DQN algorithm to train different components of the hierarchy. While the native action space is completely intractable for simple DQN agents, our architecture can be used to establish an effective way to interact with different tasks, significantly improving the performance of the same DQN agent over different levels of abstraction.