Personalized Medicine for OSA Syndrome in a Nutshell: Conceptual Clarification for Integration.
Christophe Gauld
Marie Darrason
Jean‐Arthur Micoulaud‐Franchi
Post-Editing Extractive Summaries by Definiteness Prediction
Jad Kabbara
Extractive summarization has been the main-stay of automatic summarization for decades. Despite all the progress, extractive summarizers sti… (voir plus)ll suffer from shortcomings including coreference issues arising from extracting sentences away from their original context in the source document. This affects the coherence and readability of extractive summaries. In this work, we propose a lightweight postediting step for extractive summaries that centers around a single linguistic decision: the definiteness of noun phrases. We conduct human evaluation studies that show that human expert judges substantially prefer the output of our proposed system over the original summaries. Moreover, based on an automatic evaluation study, we provide evidence for our system’s ability to generate linguistic decisions that lead to improved extractive summaries. We also draw insights about how the automatic system is exploiting some local cues related to the writing style of the main article texts or summary texts to make the decisions, rather than reasoning about the contexts pragmatically.
Predicting Unreliable Predictions by Shattering a Neural Network
Xu Ji
Andrea Vedaldi
Balaji Lakshminarayanan
Piecewise linear neural networks can be split into subfunctions, each with its own activation pattern, domain, and empirical error. Empirica… (voir plus)l error for the full network can be written as an expectation over empirical error of subfunctions. Constructing a generalization bound on subfunction empirical error indicates that the more densely a subfunction is surrounded by training samples in representation space, the more reliable its predictions are. Further, it suggests that models with fewer activation regions generalize better, and models that abstract knowledge to a greater degree generalize better, all else equal. We propose not only a theoretical framework to reason about subfunction error bounds but also a pragmatic way of approximately evaluating it, which we apply to predicting which samples the network will not successfully generalize to. We test our method on detection of misclassification and out-of-distribution samples, finding that it performs competitively in both cases. In short, some network activation patterns are associated with higher reliability than others, and these can be identified using subfunction error bounds.
Preferential Temporal Difference Learning
Nishanth Anand
Pretraining Representations for Data-Efficient Reinforcement Learning
Max Schwarzer
Nitarshan Rajkumar
Michael Noukhovitch
Ankesh Anand
Philip Bachman
Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder w… (voir plus)hich is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting.
RAFFIC V IS : Fighting Human Trafficking through Visualization
Catalina Vajiac
Andreas Olligschlaeger
Yifei Li
Pratheeksha Nair
Meng-Chieh Lee
Namyong Park
Duen Horng Chau
Christos Faloutsos
Law enforcement can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected ads. Given such cl… (voir plus)usters, how can we interactively visualize potential evidence for law enforcement and domain experts? We present TRAFFICVIS, which, to our knowledge, is the first interface for cluster-level HT detection and labeling. It builds on state-of-the-art HT clustering algorithms by incorporating metadata as a signal of organized and potentially suspicious activity. Also, domain experts can label clusters as HT, spam, and more, efficiently creating labeled datasets to enable further HT research. TRAFFICVIS has been built in close collaboration with domain experts, who estimate that TRAFFICVIS provides a median 36x speedup over manual labeling.
Randomized Exploration in Reinforcement Learning with General Value Function Approximation
Haque Ishfaq
Qiwen Cui
Viet Bang Nguyen
Alex Ayoub
Zhuoran Yang
Zhaoran Wang
Lin Yang
Randomized Least Squares Policy Optimization
Haque Ishfaq
Zhuoran Yang
Andrei-Stefan Lupu
Viet Bang Nguyen
Lewis Liu
Riashat Islam
Zhaoran Wang
Policy Optimization (PO) methods with function approximation are one of the most popular classes of Reinforcement Learning (RL) algorithms. … (voir plus)However, designing provably efficient policy optimization algorithms remains a challenge. Recent work in this area has focused on incorporating upper confidence bound (UCB)-style bonuses to drive exploration in policy optimization. In this paper, we present Randomized Least Squares Policy Optimization (RLSPO) which is inspired by Thompson Sampling. We prove that, in an episodic linear kernel MDP setting, RLSPO achieves (cid:101) O ( d 3 / 2 H 3 / 2 √ T ) worst-case (frequentist) regret, where H is the number of episodes, T is the total number of steps and d is the feature dimension. Finally, we evaluate RLSPO empirically and show that it is competitive with existing provably efficient PO algorithms.
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems
Mukul Gagrani
Sagar Sudhakara
Ashutosh Nayyar
Yi Ouyang
—We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al. [1]. The… (voir plus) regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of ˜ O ( √ T ) , where T is the time-horizon and the ˜ O ( · ) notation hides logarithmic terms in T .
Rethinking Graph Transformers with Spectral Attention
Devin Kreuzer
William L. Hamilton
Vincent Létourneau
Prudencio Tossou
In recent years, the Transformer architecture has proven to be very successful in sequence processing, but its application to other data str… (voir plus)uctures, such as graphs, has remained limited due to the difficulty of properly defining positions. Here, we present the
Routine Bandits: Minimizing Regret on Recurring Problems
Hassan Saber
L'eo Saci
Odalric-Ambrym Maillard
Saliency is a Possible Red Herring When Diagnosing Poor Generalization
Joseph D Viviano
Becks Simpson
Francis Dutil
Joseph Paul Cohen
Poor generalization is one symptom of models that learn to predict target variables using spuriously-correlated image features present only … (voir plus)in the training distribution instead of the true image features that denote a class. It is often thought that this can be diagnosed visually using attribution (aka saliency) maps. We study if this assumption is correct. In some prediction tasks, such as for medical images, one may have some images with masks drawn by a human expert, indicating a region of the image containing relevant information to make the prediction. We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert have labelled as important and generalization performance. These results suggest that the root cause of poor generalization may not always be spatially defined, and raise questions about the utility of masks as 'attribution priors' as well as saliency maps for explainable predictions.