Susan Amin

Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning

Alexander Wong

While significant research advances have been made in the field of deep reinforcement learning, there have been no concrete adversarial atta… (voir plus)ck strategies in literature tailored for studying the vulnerability of deep reinforcement learning algorithms to membership inference attacks. In such attacking systems, the adversary targets the set of collected input data on which the deep reinforcement learning algorithm has been trained. To address this gap, we propose an adversarial attack framework designed for testing the vulnerability of a state-of-the-art deep reinforcement learning algorithm to a membership inference attack. In particular, we design a series of experiments to investigate the impact of temporal correlation, which naturally exists in reinforcement learning training data, on the probability of information leakage. Moreover, we compare the performance of \emph{collective} and \emph{individual} membership attacks against the deep reinforcement learning algorithm. Experimental results show that the proposed adversarial attack framework is surprisingly effective at inferring data with an accuracy exceeding

2022-12-31

IEEE Access (publié)

doi.org

arxiv.org

A Survey of Exploration Methods in Reinforcement Learning

Herke van Hoof

2021-08-31

ArXiv (prépublication)

arxiv.org

Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structur… (voir plus)es and continuous state and action spaces. Intuitively, if the reinforcement signal is very scarce, the agent should rely on some form of short-term memory in order to cover its environment efficiently. We propose a new exploration method, based on two intuitions: (1) the choice of the next exploratory action should depend not only on the (Markovian) state of the environment, but also on the agent's trajectory so far, and (2) the agent should utilize a measure of spread in the state space to avoid getting stuck in a small region. Our method leverages concepts often used in statistical physics to provide explanations for the behavior of simplified (polymer) chains in order to generate persistent (locally self-avoiding) trajectories in state space. We discuss the theoretical properties of locally self-avoiding walks and their ability to provide a kind of short-term memory through a decaying temporal correlation within the trajectory. We provide empirical evaluations of our approach in a simulated 2D navigation task, as well as higher-dimensional MuJoCo continuous control locomotion tasks with sparse rewards.

2021-06-30

Proceedings of the 38th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning

Alexander Wong

to

2020-12-31

arXiv.org (prépublication)

dblp.uni-trier.de