Publications

Learning Robust Representations for Transfer in Reinforcement Learning

Faisal Mohamed

Roger Creus Castanyer

Hongyao Tang

Zahra Sheikhbahaee

Learning transferable representations for deep reinforcement learning (RL) is a challenging problem due to the inherent non-stationarity, di… (voir plus)stribution shift, and unstable training dynamics. To be useful, a transferable representation needs to be robust to such factors. In this work, we introduce a new architecture and training strategy for learning robust representations for transfer learning in RL. We propose leveraging multiple CNN encoders and training them not to specialize in areas of the state space but instead to match each other's representation. We find that learned representations transfer well across many Atari tasks, resulting in better transfer learning performance and data efficiency than training from scratch.

2024-10-10

NeurIPS.cc/2024/Workshop/FITML (poster)

openreview.net

Learning Stochastic Rainbow Networks

Vivian White

Muawiz Sajjad Chaudhary

Guy Wolf

Guillaume Lajoie

Kameron Decker Harris

Random feature models are a popular approach for studying network learning that can capture important behaviors while remaining simpler than… (voir plus) traditional training. Guth et al. [2024] introduced “rainbow” networks which model the distribution of trained weights as correlated random features conditioned on previous layer activity. Sampling new weights from distributions fit to learned networks led to similar performance in entirely untrained networks, and the observed weight covariance were found to be low rank. This provided evidence that random feature models could be extended to some networks away from initialization, but White et al. [2024] failed to replicate their results in the deeper ResNet18 architecture. Here we ask whether the rainbow formulation can succeed in deeper networks by directly training a stochastic ensemble of random features, which we call stochastic rainbow networks. At every gradient descent iteration, new weights are sampled for all intermediate layers and features aligned layer-wise. We find: (1) this approach scales to deeper models, which outperform shallow networks at large widths; (2) ensembling multiple samples from the stochastic model is better than retraining the classifier head; and (3) low-rank parameterization of the learnable weight covariances can approach the accuracy of full-rank networks. This offers more evidence for rainbow and other structured random feature networks as reduced models of deep learning.

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

openreview.net

LIBS-Raman Multimodal Architecture for Automated Lunar Prospecting

Jérôme Pigeon

Foutse Khomh

Richard Boudreault

Ahmed Ashraf

P. Maghoul

2024-10-10

Earth and Space 2024 (publié)

doi.org

LIBS-Raman Multimodal Architecture for Automated Lunar Prospecting

Jérôme Pigeon

Foutse Khomh

Richard Boudreault

Ahmed Ashraf

Pooneh Maghoul

2024-10-10

Earth and Space 2024 (publié)

doi.org

Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases

Cristian Meo

Akihiro Nakano

Mircea Tudor Lică

Aniket Rajiv Didolkar

Masahiro Suzuki

Anirudh Goyal

Mengmi Zhang

Justin Dauwels

Yutaka Matsuo

Yoshua Bengio

2024-10-10

NeurIPS.cc/2024/Workshop/Compositional_Learning (poster)

doi.org

openreview.net

OC-CLIP : Object-centric binding in Contrastive Language-Image Pretraining

Rim Assouel

Pietro Astolfi

Florian Bordes

Michal Drozdzal

Adriana Romero Soriano

Recent advancements in vision-language models (VLMs) have been driven by contrastive models like CLIP which learn to associate visual inform… (voir plus)ation with their corresponding text descriptions. However, these models have limitations in understanding complex compositional scenes involving multiple objects and their spatial relationships. To address these challenges, we propose a novel approach that diverges from traditional data-centric methods of enhancing model performance with hard negatives examples. Our work instead focuses on integrating sufficient inductive biases into pre-trained CLIP-like models to improve their compositional understanding without using additional data annotations. We introduce a binding module that connects a scene graph of the text with an induced graph-like representation of the image, facilitating a structured similarity assessment. We also leverage relationships as text-conditioned visual constraints, thereby capturing the intricate interactions between objects and their contextual relationships more effectively. Our resulting model (OC-CLIP) not only enhances the performance of CLIP in multi-object compositional understanding but also paves the way for more accurate and efficient image-text matching in complex scenes.

2024-10-10

NeurIPS.cc/2024/Workshop/Compositional_Learning (poster)

openreview.net

Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

Hiroki Naganuma

Xinzhi Zhang

Man-Chung Yue

Ioannis Mitliagkas

Russell J. Hewett

Philipp Andre Witte

Yin Tat Lee

Recent trends of larger model and larger datasets require huge amounts of computational resources, making distributed deep learning essentia… (voir plus)l. Data parallelism is a common approach to speed up training, but it often involves frequent communication between workers, which can be a bottleneck. In this work, we propose a method called Pseudo-Asynchronous Local SGD (PALSGD) to improve the efficiency of data-parallel training. PALSGD is a novel extension of LocalSGD (SU Stich, 2018), designed to further reduce communication frequency by introducing a pseudo-synchronization mechanism. PALSGD allows the use of longer synchronization intervals compared to standard LocalSGD. Despite the reduced communication frequency, the pseudo-synchronization approach ensures that model consistency is maintained, leading to performance results comparable to those achieved with more frequent synchronization. Furthermore, we provide a theoretical analysis of PALSGD, establishing its convergence and deriving its convergence rate. This analysis offers insights into the algorithm's behavior and performance guarantees. We evaluated PALSGD on CIFAR-10 using a CNN and GPT-NEO on TinyStories. Our results show that PALSGD achieves better performance in less time compared to existing methods like distributed data parallel (DDP), Local SGD and DiLoCo (Douillard et al. 2023).

2024-10-10

NeurIPS.cc/2024/Workshop/OPT (publié)

openreview.net

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

Jarrid Rector-Brooks

Mohsin Hasan

Zhangzhi Peng

Zachary Quinn

Cheng-Hao Liu

Sarthak Mittal

Nouha Dziri

Michael M. Bronstein

Yoshua Bengio

Pranam Chatterjee

Alexander Tong

Joey Bose

2024-10-10

ArXiv (prépublication)

doi.org

arxiv.org

On the Implicit Relation Between Low-Rank Adaptation and Differential Privacy

Saber Malekmohammadi

Golnoosh Farnadi

A significant approach in natural language processing involves large-scale pre-training on general domain data followed by adaptation to spe… (voir plus)cific tasks or domains. As models grow in size, full fine-tuning all parameters becomes increasingly impractical. To address this, some methods for low-rank task adaptation of language models have been proposed, e.g. LoRA and FLoRA. These methods keep the pre-trained model weights fixed and incorporate trainable low-rank decomposition matrices into some layers of the transformer architecture, called adapters. This approach significantly reduces the number of trainable parameters required for downstream tasks compared to full fine-tuning all parameters. In this work, we look at low-rank adaptation from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA and FLoRA is equivalent to injecting some random noise into the batch gradients w.r.t the adapter parameters coming from their full fine-tuning, and we quantify the variance of the injected noise. By establishing a Berry-Esseen type bound on the total variation distance between the noise distribution and a Gaussian distribution with the same variance, we show that the dynamics of LoRA and FLoRA are very close to differentially private full fine-tuning the adapters, which suggests that low-rank adaptation implicitly provides privacy w.r.t the fine-tuning data. Finally, using Johnson-Lindenstrauss lemma, we show that when augmented with gradient clipping, low-rank adaptation is almost equivalent to differentially private full fine-tuning adapters with a fixed noise scale.

2024-10-10

NeurIPS.cc/2024/Workshop/M3L (poster)

doi.org

openreview.net

The Pitfalls of Memorization: When Memorization Hinders Generalization

Reza Bayat

Mohammad Pezeshki

Elvis Dohmatob

David Lopez-Paz

Pascal Vincent

Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explan… (voir plus)ations. This leads to poor generalization when the learned explanations are spurious. In this work, we formalize

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

openreview.net

The Pitfalls of Memorization: When Memorization Hinders Generalization

Reza Bayat

Mohammad Pezeshki

Elvis Dohmatob

David Lopez-Paz

Pascal Vincent

Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explan… (voir plus)ations. This leads to poor generalization when the learned explanations are spurious. In this work, we formalize

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

openreview.net

Tight Lower Bounds and Improved Convergence in Performative Prediction

Pedram Khorsandi

Rushil Gupta

Mehrnaz Mofakhami

Simon Lacoste-Julien

Gauthier Gidel

Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in th… (voir plus)e real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.

2024-10-10

NeurIPS.cc/2024/Workshop/OPT (publié)

doi.org

openreview.net

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications