Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
Anirudh Goyal
Alex Lamb
Phanideep Gampa
Philippe Beaudoin
Charles Blundell
Sergey Levine
Michael Curtis Mozer
Fast and Slow Learning of Recurrent Independent Mechanisms
Nan Rosemary Ke
Anirudh Goyal
Bernhard Schölkopf
Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning age… (voir plus)nt interacting with its environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particularly relevant for being able to generalize in a systematic manner to out-of-distribution changes. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks. An attention mechanism dynamically selects which modules can be adapted to the current task, and the parameters of the selected modules are allowed to change quickly as the learner is confronted with variations in what it experiences, while the parameters of the attention mechanisms act as stable, slowly changing, meta-parameters. We focus on pieces of knowledge captured by an ensemble of modules sparsely communicating with each other via a bottleneck of attention. We find that meta-learning the modular aspects of the proposed system greatly helps in achieving faster adaptation in a reinforcement learning setup involving navigation in a partially observed grid world with image-level input. We also find that reversing the role of parameters and meta-parameters does not work nearly as well, suggesting a particular role for fast adaptation of the dynamically selected modules.
Faults in deep reinforcement learning programs: a taxonomy and a detection approach
Amin Nikanjam
Mohammad Mehdi Morovati
Houssem Ben Braiek
Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case
Prashanth L.A.
In this paper, we study the finite-time behaviour of temporal difference (TD) learning algorithms when combined with tail-averaging, and pr… (voir plus)esent instance dependent bounds on the parameter error of the tail-averaged TD iterate. Our error bounds hold in expectation as well as with high probability, exhibit a sharper rate of decay for the initial error (bias), and are comparable with existing bounds in the literature.
Flexible Option Learning
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation
Moksh J. Jain
Maksym Korablyov
This paper is about the problem of learning a stochastic policy for generating an object (like a molecular graph) from a sequence of actions… (voir plus), such that the probability of generating an object is proportional to a given positive reward for that object. Whereas standard return maximization tends to converge to a single return-maximizing sequence, there are cases where we would like to sample a diverse set of high-return solutions. These arise, for example, in black-box function optimization when few rounds are possible, each with large batches of queries, where the batches should be diverse, e.g., in the design of new molecules. One can also see this as a problem of approximately converting an energy function to a generative distribution. While MCMC methods can achieve that, they are expensive and generally only perform local exploration. Instead, training a generative policy amortizes the cost of search during training and yields to fast generation. Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e.g., there are many ways to sequentially add atoms to generate some molecular graph. We cast the set of trajectories as a flow and convert the flow consistency equations into a learning objective, akin to the casting of the Bellman equations into Temporal Difference methods. We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution, and demonstrate the improved performance and diversity of GFlowNet on a simple domain where there are many modes to the reward function, and on a molecule synthesis task.
Guest Editorial Explainable AI: Towards Fairness, Accountability, Transparency and Trust in Healthcare
Arash Shaban-Nejad
Martin Michalowski
John S. Brownstein
Image Dehazing in Disproportionate Haze Distributions
Shih-Chia Huang
Da-Wei Jaw
Wenli Li
Zhihui Lu
Sy-Yen Kuo
Bo-Hao Chen
Thanisa Numnonda
Haze removal techniques employed to increase the visibility level of an image play an important role in many vision-based systems. Several t… (voir plus)raditional dark channel prior-based methods have been proposed to remove haze formation and thereby enhance the robustness of these systems. However, when the captured images contain disproportionate haze distributions, these methods usually fail to attain effective restoration in the restored image. Specifically, disproportionate haze distribution in an image means that the background region possesses heavy haze density and the foreground region possesses little haze density. This phenomenon usually occurs in a hazy image with a deep depth of field. In response, a novel hybrid transmission map-based haze removal method that specifically targets this situation is proposed in this work to achieve clear visibility restoration and effective information maintenance. Experimental results via both qualitative and quantitative evaluations demonstrate that the proposed method is capable of performing with higher efficacy when compared with other state-of-the-art methods, in respect to both background regions and foreground regions of restored test images captured in real-world environments.
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL
Paul Mineiro
Pavithra Srinath
Reza Sharifi Sedeh
Adith Swaminathan
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their lo… (voir plus)ng-term utility. Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system. Immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric. Many works have applied episodic reinforcement learning (RL) techniques for session-based recommendation but these methods do not account for policy-induced drift in user intent across sessions. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions. By varying the horizon hyper-parameter in SHPI, we recover well-known policy improvement schemes in the RL literature. Empirical results on four recommendation tasks show that SHPI can outperform matrix factorization, offline bandits, and offline RL baselines. We also provide a stable and computationally efficient implementation using weighted regression oracles.
Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)
Philippe Vincent‐lamarre
Koustuv Sinha
Vincent Larivière
Alina Beygelzimer
Florence D'alche-buc
E. Fox
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
Meng Cao
Yue Dong
State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or fine-tuning step. 1
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
Meng Cao
Yue Dong
State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or fine-tuning step. 1