Publications

Prediction and Control in Continual Reinforcement Learning

Nishanth Anand

Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful po… (voir plus)licies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Shivakanth Sujit

Somjit Nath

Pedro H.M. Braga

Samira Ebrahimi Kahou

Most reinforcement learning algorithms take advantage of an experience replay buffer to repeatedly train on samples the agent has observed i… (voir plus)n the past. Not all samples carry the same amount of significance and simply assigning equal importance to each of the samples is a naïve strategy. In this paper, we propose a method to prioritize samples based on how much we can learn from a sample. We define the learn-ability of a sample as the steady decrease of the training loss associated with this sample over time. We develop an algorithm to prioritize samples with high learn-ability, while assigning lower priority to those that are hard-to-learn, typically caused by noise or stochasticity. We empirically show that our method is more robust than random sampling and also better than just prioritizing with respect to the training loss, i.e. the temporal difference loss, which is used in prioritized experience replay.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Retrieval-Augmented Multiple Instance Learning

Yufei Cui

Ziquan Liu

Yixin CHEN

Yuchen Lu

Xinyue Yu

Xue Liu

Tei-Wei Kuo

Miguel R. D. Rodrigues

Chun Jason Xue

Antoni B. Chan

Multiple Instance Learning (MIL) is a crucial weakly supervised learning method applied across various domains, e.g., medical diagnosis base… (voir plus)d on whole slide images (WSIs). Recent advancements in MIL algorithms have yielded exceptional performance when the training and test data originate from the same domain, such as WSIs obtained from the same hospital. However, this paper reveals a performance deterioration of MIL models when tested on an out-of-domain test set, exemplified by WSIs sourced from a novel hospital. To address this challenge, this paper introduces the Retrieval-AugMented MIL (RAM-MIL) framework, which integrates Optimal Transport (OT) as the distance metric for nearest neighbor retrieval. The development of RAM-MIL is driven by two key insights. First, a theoretical discovery indicates that reducing the input's intrinsic dimension can minimize the approximation error in attention-based MIL. Second, previous studies highlight a link between input intrinsic dimension and the feature merging process with the retrieved data. Empirical evaluations conducted on WSI classification demonstrate that the proposed RAM-MIL framework achieves state-of-the-art performance in both in-domain scenarios, where the training and retrieval data are in the same domain, and more crucially, in out-of-domain scenarios, where the (unlabeled) retrieval data originates from a different domain. Furthermore, the use of the transportation matrix derived from OT renders the retrieval results interpretable at the instance level, in contrast to the vanilla

2023-09-20

NeurIPS.cc/2023/Conference (poster)

openreview.net

Reusable Slotwise Mechanisms

Trang Nguyen

Amin Mansouri

Kanika Madan

Khuong Nguyen

Nguyen Duy Khuong

Kartik Ahuja

Dianbo Liu

Yoshua Bengio

Agents with the ability to comprehend and reason about the dynamics of objects would be expected to exhibit improved robustness and generali… (voir plus)zation in novel scenarios. However, achieving this capability necessitates not only an effective scene representation but also an understanding of the mechanisms governing interactions among object subsets. Recent studies have made significant progress in representing scenes using object slots. In this work, we introduce Reusable Slotwise Mechanisms, or RSM, a framework that models object dynamics by leveraging communication among slots along with a modular architecture capable of dynamically selecting reusable mechanisms for predicting the future states of each object slot. Crucially, RSM leverages the Central Contextual Information (CCI), enabling selected mechanisms to access the remaining slots through a bottleneck, effectively allowing for modeling of higher order and complex interactions that might require a sparse subset of objects. Experimental results demonstrate the superior performance of RSM compared to state-of-the-art methods across various future prediction and related downstream tasks, including Visual Question Answering and action planning. Furthermore, we showcase RSM's Out-of-Distribution generalization ability to handle scenes in intricate scenarios.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Small batch deep reinforcement learning

Johan Obando-Ceron

Bellemare Marc-Emmanuel

Pablo Samuel Castro

In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each … (voir plus)gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning

Casey Meehan

Florian Bordes

Pascal Vincent

Kamalika Chaudhuri

Chuan Guo

Self-supervised learning (SSL) algorithms can produce useful image representations by learning to associate different parts of natural image… (voir plus)s with one another. However, when taken to the extreme, SSL models can unintendedly memorize specific parts in individual training samples rather than learning semantically meaningful associations. In this work, we perform a systematic study of the unintended memorization of image-specific information in SSL models -- which we refer to as déjà vu memorization. Concretely, we show that given the trained model and a crop of a training image containing only the background (e.g., water, sky, grass), it is possible to infer the foreground object with high accuracy or even visually reconstruct it. Furthermore, we show that déjà vu memorization is common to different SSL algorithms, is exacerbated by certain design choices, and cannot be detected by conventional techniques for evaluating representation quality. Our study of déjà vu memorization reveals previously unknown privacy risks in SSL models, as well as suggests potential practical mitigation strategies.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Statistical Guarantees for Variational Autoencoders using PAC-Bayesian Theory

Sokhna Diarra Mbacke

Florence Clerc

Pascal Germain

2023-09-20

NeurIPS.cc/2023/Conference (spotlight)

doi.org

openreview.net

The Impact of Positional Encoding on Length Generalization in Transformers

Amirhossein Kazemnejad

Inkit Padhi

Karthikeyan Natesan Ramamurthy

Payel Das

Siva Reddy

Length generalization, the ability to generalize from small training context sizes to larger ones, is a critical challenge in the developmen… (voir plus)t of Transformer-based language models. Positional encoding (PE) has been identified as a major factor influencing length generalization, but the exact impact of different PE schemes on extrapolation in downstream tasks remains unclear. In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5's Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE). Our evaluation encompasses a battery of reasoning and mathematical tasks. Our findings reveal that the most commonly used positional encoding methods, such as ALiBi, Rotary, and APE, are not well suited for length generalization in downstream tasks. More importantly, NoPE outperforms other explicit positional encoding methods while requiring no additional computation. We theoretically demonstrate that NoPE can represent both absolute and relative PEs, but when trained with SGD, it mostly resembles T5's relative PE attention patterns. Finally, we find that scratchpad is not always helpful to solve length generalization and its format highly impacts the model's performance. Overall, our work suggests that explicit position embeddings are not essential for decoder-only Transformers to generalize well to longer sequences.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Thinker: Learning to Plan and Act

Stephen Chung

Ivan Anokhin

David Krueger

We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a le… (voir plus)arned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for handcrafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. Thinker is the first work showing that an RL agent can learn to plan with a learned world model in complex environments.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network

Fuyuan Lyu

Xing Tang

Dugang Liu

Chen Ma

Weihong Luo

Liang Chen

xiuqiang He

Xue Liu

2023-09-20

NeurIPS.cc/2023/Conference (poster)

openreview.net

A Unified, Scalable Framework for Neural Population Decoding

Mehdi Azabou

Vinam Arora

Venkataramana Ganesh

Ximeng Mao

Santosh Nachimuthu

Michael J. Mendelson

Blake Richards

Matthew G. Perich

Guillaume Lajoie

Eva L. Dyer

Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size… (voir plus) and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

Tianwei Ni

Michel Ma

Benjamin Eysenbach

Pierre-Luc Bacon

Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, an… (voir plus)d determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the underlying reason for the strong performance of Transformer-based RL methods remains unclear: is it because they learn effective memory, or because they perform effective credit assignment? After introducing formal definitions of memory length and credit assignment length, we design simple configurable tasks to measure these distinct quantities. Our empirical results reveal that Transformers can enhance the memory capability of RL algorithms, scaling up to tasks that require memorizing observations

2023-09-20

NeurIPS.cc/2023/Conference (présentation orale)

doi.org

openreview.net

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Publications