Publications

SelfIE: Self-Interpretation of Large Language Model Embeddings

Haozhe Chen

Carl Vondrick

Chengzhi Mao

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Eugene Belilovsky

Irina Rish

2024-07-08

TMLR (accepted)

doi.org

openreview.net

Stealing part of a production language model

Nicholas Carlini

Daniel Paleka

Krishnamurthy Dj Dvijotham

Thomas Steinke

Jonathan Hayase

A. Feder Cooper

Katherine Lee

Matthew Jagielski

Milad Nasr

Arthur Conmy

Eric Wallace

David Rolnick

Florian Tramèr

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like … (see more)OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \\

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Stochastic positional embeddings improve masked image modeling

Amir Bar

Florian Bordes

Assaf Shocher

Mahmoud Assran

Pascal Vincent

Nicolas Ballas

Trevor Darrell

Amir Globerson

Yann LeCun

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

openreview.net

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

Alex Irpan

Sergey Levine

Pablo Samuel Castro

Aleksandra Faust

Aviral Kumar

Rishabh Agarwal

Value functions are an essential component in deep reinforcement learning (RL), that are typically trained via mean squared error regression… (see more) to match bootstrapped target values. However, scaling value-based RL methods to large networks has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We show that training value functions with categorical cross-entropy significantly enhances performance and scalability across various domains, including single-task RL on Atari 2600 games, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that categorical cross-entropy mitigates issues inherent to value-based RL, such as noisy targets and non-stationarity. We argue that shifting to categorical cross-entropy for training value functions can substantially improve the scalability of deep RL at little-to-no cost.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

Ziquan Liu

Yufei Cui

Yan Yan

Yi Xu

Xiangyang Ji

Xue (Steve) Liu

Antoni B. Chan

In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient healt… (see more)h and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

Think Before You Act: Decision Transformers with Working Memory

Jikun Kang

Romain Laroche

Xingdi Yuan

Adam Trischler

Xue (Steve) Liu

Jie Fu

Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance rel… (see more)ies on massive data and computation. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model’s performance on previous tasks. In contrast to LLMs’ implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Inspired by this, we propose a working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in Atari games and Meta-World object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Oleksiy Ostapenko

Zhan Su

Edoardo Ponti

Laurent Charlin

Nicolas Le Roux

Matheus Pereira

Lucas Caccia

Alessandro Sordoni

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Do Transformer World Models Give Better Policy Gradients?

Michel Ma

Tianwei Ni

Clement Gehring

Pierluca D'Oro

Pierre-Luc Bacon

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Unsupervised Concept Discovery Mitigates Spurious Correlations

Md Rifat Arefin

Yan Zhang

Aristide Baratin

Francesco Locatello

Irina Rish

Dianbo Liu

Kenji Kawaguchi

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

In value-based deep reinforcement learning, a pruned network is a good network

Johan Samir Obando Ceron

Aaron Courville

Pablo Samuel Castro

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage pri… (see more)or insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables {value-based} agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters. Our code is publicly available, see Appendix A for details.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

openreview.net

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

Siva Reddy

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net