Publications

Pretraining Representations for Data-Efficient Reinforcement Learning

Max Schwarzer

Nitarshan Rajkumar

Michael Noukhovitch

Ankesh Anand

Laurent Charlin

(Rex) Devon Hjelm

Philip Bachman

Aaron Courville

Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder w… (voir plus)hich is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting.

openreview.net

RAFFIC V IS : Fighting Human Trafﬁcking through Visualization

Catalina Vajiac

Andreas Olligschlaeger

Yifei Li

Pratheeksha Nair

Meng-Chieh Lee

Namyong Park

Reihaneh Rabbany

Duen Horng Chau

Christos Faloutsos

Law enforcement can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected ads. Given such cl… (voir plus)usters, how can we interactively visualize potential evidence for law enforcement and domain experts? We present TRAFFICVIS, which, to our knowledge, is the first interface for cluster-level HT detection and labeling. It builds on state-of-the-art HT clustering algorithms by incorporating metadata as a signal of organized and potentially suspicious activity. Also, domain experts can label clusters as HT, spam, and more, efficiently creating labeled datasets to enable further HT research. TRAFFICVIS has been built in close collaboration with domain experts, who estimate that TRAFFICVIS provides a median 36x speedup over manual labeling.

2021-01-01

(publié)

www.semanticscholar.org

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

Mukul Gagrani

Sagar Sudhakara

Aditya Mahajan

Ashutosh Nayyar

Yi Ouyang

—We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al. [1]. The… (voir plus) regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modiﬁcation in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modiﬁed algorithm has the same Bayesian regret of ˜ O ( √ T ) , where T is the time-horizon and the ˜ O ( · ) notation hides logarithmic terms in T .

2021-01-01

arXiv.org (prépublication)

dblp.uni-trier.de

Rethinking Graph Transformers with Spectral Attention

Devin Kreuzer

Dominique Beaini

William Hamilton

Vincent Létourneau

Prudencio Tossou

In recent years, the Transformer architecture has proven to be very successful in sequence processing, but its application to other data str… (voir plus)uctures, such as graphs, has remained limited due to the difficulty of properly defining positions. Here, we present the

openreview.net

Routine Bandits: Minimizing Regret on Recurring Problems

Hassan Saber

L'eo Saci

Odalric-Ambrym Maillard

Audrey Durand

2021-01-01

ECML/PKDD (publié)

doi.org

Scalable Change Point Detection for Dynamic Graphs

Shenyang Huang

Guillaume Rabusseau

Reihaneh Rabbany

Real world networks often evolve in complex ways over time. Understanding anomalies in dynamic networks is crucial for applications such as … (voir plus)traffic accident detection, intrusion identification and detection of ecosystem disturbances. In this work, we focus on the problem of change point detection in dynamic graphs. The goal is to identify time steps where the graph structure deviates significantly from the norm. Despite empirical success of recent methods, building a change point detection method for real world dynamic graphs, which often scale to millions of nodes, remains an open question. To fill this gap, we propose LADdos, a scalable method for change point detection in dynamic graphs. LADdos brings together ideas from two recent works: an accurate change point detection method for graphs called LAD [10] which detects the changes in the full Laplacian spectrum of the graph in each timestamp, and the general framework of network density of states (DOS) [5] which models the distribution of the singular values through efficient approximation methods. In experiments with two common graph models –the Stochastic Block Model (SBM) and the Barabási-Albert (BA) model – we show that LADdos has equal performance to LAD, which is the current state-of-the-art, while being orders of magnitude faster. For instance, on a dynamic graph with total 21 million edges over 150 timestamps, LADdos achieves 100x speedup when compared to LAD.

2021-01-01

(publié)

www.semanticscholar.org

Temporally Abstract Partial Models

Khimya Khetarpal

Zafarali Ahmed

Gheorghe Comanici

Doina Precup

Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement l… (voir plus)earning, option models (Sutton, Precup \& Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a notion of affordances for options, and develop temporally abstract partial option models, that take into account the fact that an option might be affordable only in certain situations. We analyze the trade-offs between estimation and approximation error in planning and learning when using such models, and identify some interesting special cases. Additionally, we empirically demonstrate the ability to learn both affordances and partial option models online resulting in improved sample efficiency and planning time in the Taxi domain.

openreview.net

Textual Time Travel: A Temporally Informed Approach to Theory of Mind

Akshatha Arodi

Jackie Cheung

Natural language processing systems such as dialogue agents should be able to reason about other people’s beliefs, intentions and desires.… (voir plus) This capability, called theory of mind (ToM), is crucial, as it allows a model to predict and interpret the needs of users based on their mental states. A recent line of research evaluates the ToM capability of existing memoryaugmented neural models through questionanswering. These models perform poorly on false belief tasks where beliefs differ from reality, especially when the dataset contains distracting sentences. In this paper, we propose a new temporally informed approach for improving the ToM capability of memory-augmented neural models. Our model incorporates priors about the entities’ minds and tracks their mental states as they evolve over time through an extended passage. It then responds to queries through textual time travel—i.e., by accessing the stored memory of an earlier time step. We evaluate our model on ToM datasets and find that this approach improves performance, particularly by correcting the predicted mental states to match the false belief.

2021-01-01

Conference on Empirical Methods in Natural Language Processing (publié)

doi.org

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

Maxime Gasse

Simon Bowly

Quentin Cappart

Jonas Charfreitag

Laurent Charlin

Didier Chételat

Antonia Chmiela

Justin Dumouchelle

Ambros Gleixner

Aleksandr Kazachkov

Elias Boutros Khalil

Paweł Lichocki

Andrea Lodi

Miles Lubin

Chris J. Maddison

Christopher Morris

D. Papageorgiou

Augustin Parjadis

Sebastian Pokutta

Antoine Prouvost … (voir 22 de plus)

Lara Scavuzzo

Giulia Zarpellon

Linxin Yangm

Sha Lai

Akang Wang

Xiaodong Luo

Xiang Zhou

Haohan Huang

Sheng Cheng Shao

Yuanming Zhu

Dong Dong Zhang

Tao Manh Quan

Zixuan Cao

Yang Xu

Zhewei Huang

Shuchang Zhou

C. Binbin

He Minggui

Haoren Ren Hao

Zhang Zhiyu

An Zhiwu

Mao Kun

Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused … (voir plus)on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.

2021-01-01

NeurIPS (Competition and Demos) (publié)

doi.org

arxiv.org

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Malik H. Altakrori

Jackie Cheung

Benjamin Fung

Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researc… (voir plus)hers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by failure to capture authorship style, by the topic shift or by other factors. Motivated by this, we propose the topic confusion task, where we switch the author-topic conﬁg-uration between training and testing set. This setup allows us to probe errors in the attribution process. We investigate the accuracy and two error measures: one caused by the models’ confusion by the switch because the features capture the topics, and one caused by the features’ inability to capture the writing styles, leading to weaker models. By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process. We further show that combining them with word-level n - grams can outperform the state-of-the-art technique in the cross-topic scenario. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task, and are outperformed by simple n -gram features.

2021-01-01

arXiv.org (prépublication)

dblp.uni-trier.de

A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix

Thang Doan

Mehdi Abbana Bennani

Bogdan Mazoure

Guillaume Rabusseau

Pierre Alquier

2021-01-01

AISTATS (publié)

proceedings.mlr.press

arxiv.org

Towards a Trace-Preserving Tensor Network Representation of Quantum Channels

Siddarth Srinivasan

Sandesh M. Adhikary

Jacob Miller

Bibek Pokharel

Guillaume Rabusseau

Byron Boots

The problem of characterizing quantum channels arises in a number of contexts such as quantum process tomography and quantum error correctio… (voir plus)n. However, direct approaches to parameterizing and optimizing the Choi matrix representation of quantum channels face a curse of dimensionality: the number of parameters scales exponentially in the number of qubits. Recently, Torlai et al. [2020] proposed using locally puriﬁed density operators (LPDOs), a tensor network representation of Choi matrices, to overcome the unfavourable scaling in parameters. While the LPDO structure allows it to satisfy a ‘complete positivity’ (CP) constraint required of physically valid quantum channels, it makes no guarantees about a similarly required ‘trace preservation’ (TP) constraint. In practice, the TP constraint is violated, and the learned quantum channel may even be trace-increasing, which is non-physical. In this work, we present the problem of optimizing over TP LPDOs, discuss two approaches to characterizing the TP constraints on LPDOs, and outline the next steps for developing an optimization scheme.

2021-01-01

(publié)

www.semanticscholar.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications