Optimization of Artificial Neural Network Hyperparameters For Processing Retrospective Information
A. Rogachev
F. Scholle
Yann LeCun
I. L. Kashirin
M. Demchenko
. Justification of the selection of the architecture and hyperparameters of artificial neural networks (ANN), focused on solving various cla… (voir plus)sses of applied problems, is a scientific and methodological problem. Optimizing the selection of ANN hyperparameters allows you to improve the quality and speed of ANN training. Various methods of optimizing the selection of ANN hyper-parameters are known – the use of evolutionary calculations, genetic algorithms, etc., but they require the use of additional software. To optimize the process of selecting ANN hyperparameters, Google Research has developed the KerasTuner software tool. It is a platform for automated search of a set of optimal combinations of hyperparameters. In Kerastuner, you can use various methods - random search, Bayesian optimization, or Hyperband. In the numerical experiments conducted by the author, 14 hyperparameters were varied, including the number of blocks of convolutional layers and the filters forming them, the type of activation function, the parameters of the "dropout" layers, and others. The studied tools demonstrated high efficiency while simultaneously varying more than a dozen optimized parameters of the convolutional network. The calculation time on the Colaboratory platform for the various combined ANN architectures studied, including recurrent RNN networks, was several hours, even with the use of GPU graphics accelerators. For ANN, focused on the processing and recognition of retrospective information, an increase in the quality of recognition was achieved to 80 ... 95%.
Overview of the TREC 2021 Fair Ranking Track
Asia J. Biega
Michael D. Ekstrand
Sebastian Kohlmeier
The TREC Fair Ranking Track aims to provide a platform for participants to develop and evaluate novel retrieval algorithms that can provide … (voir plus)a fair exposure to a mixture of demographics or attributes, such as ethnicity, that are represented by relevant documents in response to a search query. For example, particular demographics or attributes can be represented by the documents' topical content or authors. The 2021 Fair Ranking Track adopted a resource allocation task. The task focused on supporting Wikipedia editors who are looking to improve the encyclopedia's coverage of topics under the purview of a WikiProject. WikiProject coordinators and/or Wikipedia editors search for Wikipedia documents that are in need of editing to improve the quality of the article. The 2021 Fair Ranking track aimed to ensure that documents that are about, or somehow represent, certain protected characteristics receive a fair exposure to the Wikipedia editors, so that the documents have an fair opportunity of being improved and, therefore, be well-represented in Wikipedia. The under-representation of particular protected characteristics in Wikipedia can result in systematic biases that can have a negative human, social, and economic impact, particularly for disadvantaged or protected societal groups.
Personalized Medicine for OSA Syndrome in a Nutshell: Conceptual Clarification for Integration.
Christophe Gauld
Marie Darrason
Jean‐Arthur Micoulaud‐Franchi
Post-Editing Extractive Summaries by Definiteness Prediction
Jad Kabbara
Extractive summarization has been the main-stay of automatic summarization for decades. Despite all the progress, extractive summarizers sti… (voir plus)ll suffer from shortcomings including coreference issues arising from extracting sentences away from their original context in the source document. This affects the coherence and readability of extractive summaries. In this work, we propose a lightweight postediting step for extractive summaries that centers around a single linguistic decision: the definiteness of noun phrases. We conduct human evaluation studies that show that human expert judges substantially prefer the output of our proposed system over the original summaries. Moreover, based on an automatic evaluation study, we provide evidence for our system’s ability to generate linguistic decisions that lead to improved extractive summaries. We also draw insights about how the automatic system is exploiting some local cues related to the writing style of the main article texts or summary texts to make the decisions, rather than reasoning about the contexts pragmatically.
Predicting Unreliable Predictions by Shattering a Neural Network
Xu Ji
Andrea Vedaldi
Balaji Lakshminarayanan
Piecewise linear neural networks can be split into subfunctions, each with its own activation pattern, domain, and empirical error. Empirica… (voir plus)l error for the full network can be written as an expectation over empirical error of subfunctions. Constructing a generalization bound on subfunction empirical error indicates that the more densely a subfunction is surrounded by training samples in representation space, the more reliable its predictions are. Further, it suggests that models with fewer activation regions generalize better, and models that abstract knowledge to a greater degree generalize better, all else equal. We propose not only a theoretical framework to reason about subfunction error bounds but also a pragmatic way of approximately evaluating it, which we apply to predicting which samples the network will not successfully generalize to. We test our method on detection of misclassification and out-of-distribution samples, finding that it performs competitively in both cases. In short, some network activation patterns are associated with higher reliability than others, and these can be identified using subfunction error bounds.
Pretraining Representations for Data-Efficient Reinforcement Learning
Max Schwarzer
Nitarshan Rajkumar
Michael Noukhovitch
Ankesh Anand
Philip Bachman
Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder w… (voir plus)hich is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting.
RAFFIC V IS : Fighting Human Trafficking through Visualization
Catalina Vajiac
Andreas Olligschlaeger
Yifei Li
Pratheeksha Nair
Meng-Chieh Lee
Namyong Park
Duen Horng Chau
Christos Faloutsos
Law enforcement can detect human trafficking (HT) in online escort websites by analyzing suspicious clusters of connected ads. Given such cl… (voir plus)usters, how can we interactively visualize potential evidence for law enforcement and domain experts? We present TRAFFICVIS, which, to our knowledge, is the first interface for cluster-level HT detection and labeling. It builds on state-of-the-art HT clustering algorithms by incorporating metadata as a signal of organized and potentially suspicious activity. Also, domain experts can label clusters as HT, spam, and more, efficiently creating labeled datasets to enable further HT research. TRAFFICVIS has been built in close collaboration with domain experts, who estimate that TRAFFICVIS provides a median 36x speedup over manual labeling.
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems
Mukul Gagrani
Sagar Sudhakara
Ashutosh Nayyar
Yi Ouyang
—We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al. [1]. The… (voir plus) regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of ˜ O ( √ T ) , where T is the time-horizon and the ˜ O ( · ) notation hides logarithmic terms in T .
Rethinking Graph Transformers with Spectral Attention
Devin Kreuzer
William L. Hamilton
Vincent Létourneau
Prudencio Tossou
In recent years, the Transformer architecture has proven to be very successful in sequence processing, but its application to other data str… (voir plus)uctures, such as graphs, has remained limited due to the difficulty of properly defining positions. Here, we present the
Routine Bandits: Minimizing Regret on Recurring Problems
Hassan Saber
L'eo Saci
Odalric-Ambrym Maillard
Saliency is a Possible Red Herring When Diagnosing Poor Generalization
Joseph D Viviano
Becks Simpson
Francis Dutil
Joseph Paul Cohen
Poor generalization is one symptom of models that learn to predict target variables using spuriously-correlated image features present only … (voir plus)in the training distribution instead of the true image features that denote a class. It is often thought that this can be diagnosed visually using attribution (aka saliency) maps. We study if this assumption is correct. In some prediction tasks, such as for medical images, one may have some images with masks drawn by a human expert, indicating a region of the image containing relevant information to make the prediction. We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert have labelled as important and generalization performance. These results suggest that the root cause of poor generalization may not always be spatially defined, and raise questions about the utility of masks as 'attribution priors' as well as saliency maps for explainable predictions.
Scalable Change Point Detection for Dynamic Graphs
Real world networks often evolve in complex ways over time. Understanding anomalies in dynamic networks is crucial for applications such as … (voir plus)traffic accident detection, intrusion identification and detection of ecosystem disturbances. In this work, we focus on the problem of change point detection in dynamic graphs. The goal is to identify time steps where the graph structure deviates significantly from the norm. Despite empirical success of recent methods, building a change point detection method for real world dynamic graphs, which often scale to millions of nodes, remains an open question. To fill this gap, we propose LADdos, a scalable method for change point detection in dynamic graphs. LADdos brings together ideas from two recent works: an accurate change point detection method for graphs called LAD [10] which detects the changes in the full Laplacian spectrum of the graph in each timestamp, and the general framework of network density of states (DOS) [5] which models the distribution of the singular values through efficient approximation methods. In experiments with two common graph models –the Stochastic Block Model (SBM) and the Barabási-Albert (BA) model – we show that LADdos has equal performance to LAD, which is the current state-of-the-art, while being orders of magnitude faster. For instance, on a dynamic graph with total 21 million edges over 150 timestamps, LADdos achieves 100x speedup when compared to LAD.