Publications

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments

Anirudh Goyal

Alex Lamb

Phanideep Gampa

Philippe Beaudoin

Charles Blundell

Sergey Levine

Yoshua Bengio

Michael Curtis Mozer

2021-01-01

ICLR (publié)

openreview.net

Fast and Slow Learning of Recurrent Independent Mechanisms

Kanika Madan

Nan Rosemary Ke

Anirudh Goyal

Bernhard Schölkopf

Yoshua Bengio

Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning age… (voir plus)nt interacting with its environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particularly relevant for being able to generalize in a systematic manner to out-of-distribution changes. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks. An attention mechanism dynamically selects which modules can be adapted to the current task, and the parameters of the selected modules are allowed to change quickly as the learner is confronted with variations in what it experiences, while the parameters of the attention mechanisms act as stable, slowly changing, meta-parameters. We focus on pieces of knowledge captured by an ensemble of modules sparsely communicating with each other via a bottleneck of attention. We find that meta-learning the modular aspects of the proposed system greatly helps in achieving faster adaptation in a reinforcement learning setup involving navigation in a partially observed grid world with image-level input. We also find that reversing the role of parameters and meta-parameters does not work nearly as well, suggesting a particular role for fast adaptation of the dynamically selected modules.

2021-01-01

ICLR (publié)

openreview.net

Faults in deep reinforcement learning programs: a taxonomy and a detection approach

Amin Nikanjam

Mohammad Mehdi Morovati

Foutse Khomh

Houssem Ben Braiek

2021-01-01

ArXiv (preprint)

doi.org

arxiv.org

Finite time analysis of temporal difference learning with linear function approximation: the tail averaged case

Gandharv Patil

Prashanth L.A.

Doina Precup

In this paper, we study the ﬁnite-time behaviour of temporal difference (TD) learning algorithms when combined with tail-averaging, and pr… (voir plus)esent instance dependent bounds on the parameter error of the tail-averaged TD iterate. Our error bounds hold in expectation as well as with high probability, exhibit a sharper rate of decay for the initial error (bias), and are comparable with existing bounds in the literature.

Flexible Option Learning

Martin Klissarov

Doina Precup

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

Emmanuel Bengio

Moksh J. Jain

Maksym Korablyov

Doina Precup

Yoshua Bengio

This paper is about the problem of learning a stochastic policy for generating an object (like a molecular graph) from a sequence of actions… (voir plus), such that the probability of generating an object is proportional to a given positive reward for that object. Whereas standard return maximization tends to converge to a single return-maximizing sequence, there are cases where we would like to sample a diverse set of high-return solutions. These arise, for example, in black-box function optimization when few rounds are possible, each with large batches of queries, where the batches should be diverse, e.g., in the design of new molecules. One can also see this as a problem of approximately converting an energy function to a generative distribution. While MCMC methods can achieve that, they are expensive and generally only perform local exploration. Instead, training a generative policy amortizes the cost of search during training and yields to fast generation. Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e.g., there are many ways to sequentially add atoms to generate some molecular graph. We cast the set of trajectories as a flow and convert the flow consistency equations into a learning objective, akin to the casting of the Bellman equations into Temporal Difference methods. We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution, and demonstrate the improved performance and diversity of GFlowNet on a simple domain where there are many modes to the reward function, and on a molecule synthesis task.

openreview.net

Guest Editorial Explainable AI: Towards Fairness, Accountability, Transparency and Trust in Healthcare

Arash Shaban-Nejad

Martin Michalowski

John S. Brownstein

David Buckeridge

2021-01-01

IEEE journal of biomedical and health informatics (publié)

doi.org

Image Dehazing in Disproportionate Haze Distributions

Shih-Chia Huang

Da-Wei Jaw

Wenli Li

Zhihui Lu

Sy-Yen Kuo

Benjamin Fung

Bo-Hao Chen

Thanisa Numnonda

Haze removal techniques employed to increase the visibility level of an image play an important role in many vision-based systems. Several t… (voir plus)raditional dark channel prior-based methods have been proposed to remove haze formation and thereby enhance the robustness of these systems. However, when the captured images contain disproportionate haze distributions, these methods usually fail to attain effective restoration in the restored image. Specifically, disproportionate haze distribution in an image means that the background region possesses heavy haze density and the foreground region possesses little haze density. This phenomenon usually occurs in a hazy image with a deep depth of field. In response, a novel hybrid transmission map-based haze removal method that specifically targets this situation is proposed in this work to achieve clear visibility restoration and effective information maintenance. Experimental results via both qualitative and quantitative evaluations demonstrate that the proposed method is capable of performing with higher efficacy when compared with other state-of-the-art methods, in respect to both background regions and foreground regions of restored test images captured in real-world environments.

2021-01-01

IEEE Access (publié)

doi.org

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL

Bogdan Mazoure

Paul Mineiro

Pavithra Srinath

Reza Sharifi Sedeh

Doina Precup

Adith Swaminathan

We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their lo… (voir plus)ng-term utility. Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system. Immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric. Many works have applied episodic reinforcement learning (RL) techniques for session-based recommendation but these methods do not account for policy-induced drift in user intent across sessions. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions. By varying the horizon hyper-parameter in SHPI, we recover well-known policy improvement schemes in the RL literature. Empirical results on four recommendation tasks show that SHPI can outperform matrix factorization, offline bandits, and offline RL baselines. We also provide a stable and computationally efficient implementation using weighted regression oracles.

2021-01-01

arXiv.org (prépublication)

dblp.uni-trier.de

Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)

Joelle Pineau

Philippe Vincent‐lamarre

Koustuv Sinha

Vincent Larivière

Alina Beygelzimer

Florence D'alche-buc

E. Fox

Hugo Larochelle

arxiv.org

Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

Meng Cao

Yue Dong

Jackie Cheung

State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneﬁcial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and ﬁnetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classiﬁcation tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or ﬁne-tuning step. 1

2021-01-01

arXiv.org (prépublication)

dblp.uni-trier.de

Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

Meng Cao

Yue Dong

Jackie Cheung

State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneﬁcial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and ﬁnetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classiﬁcation tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or ﬁne-tuning step. 1

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications