Publications

Cognitive Models as Simulators: Using Cognitive Models to Tap into Implicit Human Feedback

Ardavan S. Nobandegani

Thomas Shultz

Irina Rish

2023-06-20

ICML.cc/2023/Workshop/ILHF (publié)

openreview.net

Constant Memory Attention Block

Leo Feng

Frederick Tung

Hossein Hajimirsadeghi

Yoshua Bengio

Mohamed Osama Ahmed

2023-06-20

ICML.cc/2023/Workshop/ES-FoMO (poster)

doi.org

openreview.net

Continual Pre-Training of Large Language Models: How to (re)warm your model?

Kshitij Gupta

Benjamin Thérien

Adam Ibrahim

Mats Leon Richter

Quentin Gregory Anthony

Eugene Belilovsky

Irina Rish

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes a… (voir plus)vailable. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch

2023-06-20

ICML.cc/2023/Workshop/ES-FoMO (poster)

doi.org

openreview.net

Questions Are All You Need to Train a Dense Passage Retriever

Devendra Singh Sachan

Mike Lewis

Dani Yogatama

Luke Zettlemoyer

Joelle Pineau

Manzil Zaheer

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training da… (voir plus)ta. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g., questions and potential answer passages). It uses a new passage-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence passages, and (2) the passages are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both passage and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.1 Our code and model checkpoints are available at: https://github.com/DevSinghSachan/art.

2023-06-20

Transactions of the Association for Computational Linguistics (publié)

doi.org

arxiv.org

ROSA: Random Orthogonal Subspace Adaptation

Marawan Gamal

Guillaume Rabusseau

Aristides Milios

Siva Reddy

2023-06-20

ICML.cc/2023/Workshop/ES-FoMO (poster)

openreview.net

Strong gravitational lensing as a probe of dark matter

Simona Vegetti

Simon Birrer

Giulia Despali

C. Fassnacht

Daniel A. Gilman

Yashar Hezaveh

L.

Laurence Perreault-Levasseur

J. McKean

D. Powell

Conor M. O'riordan

G.

Vernardos

Dark matter structures within strong gravitational lens galaxies and along their line of sight leave a gravitational imprint on the multiple… (voir plus) images of lensed sources. Strong gravitational lensing provides, therefore, a key test of different dark matter models in a way that is independent of the baryonic content of matter structures on subgalactic scales. In this chapter, we describe how galaxy-scale strong gravitational lensing observations are sensitive to the physical nature of dark matter. We provide a historical perspective of the field, and review its current status. We discuss the challenges and advances in terms of data, treatment of systematic errors and theoretical predictions, that will enable one to deliver a stringent and robust test of different dark matter models in the near future. With the advent of the next generation of sky surveys, the number of known strong gravitational lens systems is expected to increase by several orders of magnitude. Coupled with high-resolution follow-up observations, these data will provide a key opportunity to constrain the properties of dark matter with strong gravitational lensing.

2023-06-20

ArXiv (prépublication)

arxiv.org

Towards Out-of-Distribution Adversarial Robustness

Adam Ibrahim

Charles Guille-Escuret

Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fail… (voir plus)s to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different

2023-06-20

ICML.cc/2023/Workshop/AdvML-Frontiers (publié)

doi.org

openreview.net

BatchGFN: Generative Flow Networks for Batch Active Learning

Shreshth A Malik

Salem Lahlou

Andrew Jesson

Moksh J. Jain

Nikolay Malkin

Tristan Deleu

Yoshua Bengio

Yarin Gal

We introduce BatchGFN—a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points pro… (voir plus)portional to a batch reward. With an appropriate reward function to quantify the utility of acquiring a batch, such as the joint mutual information between the batch and the model parameters, BatchGFN is able to construct highly informative batches for active learning in a principled way. We show our approach enables sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems. This alleviates the computational complexity of batch-aware algorithms and removes the need for greedy approximations to find maximizers for the batch reward. We also present early results for amortizing training across acquisition steps, which will enable scaling to real-world tasks.

2023-06-19

ICML.cc/2023/Workshop/SPIGM (poster)

doi.org

openreview.net

Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation

Chris Emezue

Alexandre Drouin

Tristan Deleu

Stefan Bauer

Yoshua Bengio

2023-06-19

ICML.cc/2023/Workshop/SPIGM (poster)

doi.org

openreview.net

CATS: A Computation-Aware Transaction Processing System with Proactive Unlocking

Bolun Zhu

Yu Hua

Ziyin Long

Xue (Steve) Liu

With the increasing complexity of network applications and high demands for QoS, transaction processing systems have received more attention… (voir plus)s due to salient features of simplicity and atomicity. Computation operations play an important role in transaction processing systems. However, conventional QoS-based mechanisms become inefficient due to the limited concurrent support upon computation operations, thus causing high time consumption in the critical path of concurrency control. In order to efficiently offer concurrent computations, we propose CATS, a Computation Aware Transaction processing System, to mitigate performance impacts caused by computation operations. CATS further leverages program semantics to defer the execution of transaction operations in the commit phase to alleviate unnecessary conflicts caused by computations. Extensive evaluation results demonstrate that CATS significantly outperforms state-of-the-art designs, including 2PL and OCC based transaction processing systems on high-contended and computation-intensive workloads. We have released the open-source codes in GitHub for public use.

2023-06-19

2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS) (publié)

doi.org

CATS: A Computation-Aware Transaction Processing System with Proactive Unlocking

Bolun Zhu

Yu Hua

Ziyin Long

Xue (Steve) Liu

With the increasing complexity of network applications and high demands for QoS, transaction processing systems have received more attention… (voir plus)s due to salient features of simplicity and atomicity. Computation operations play an important role in transaction processing systems. However, conventional QoS-based mechanisms become inefficient due to the limited concurrent support upon computation operations, thus causing high time consumption in the critical path of concurrency control. In order to efficiently offer concurrent computations, we propose CATS, a Computation Aware Transaction processing System, to mitigate performance impacts caused by computation operations. CATS further leverages program semantics to defer the execution of transaction operations in the commit phase to alleviate unnecessary conflicts caused by computations. Extensive evaluation results demonstrate that CATS significantly outperforms state-of-the-art designs, including 2PL and OCC based transaction processing systems on high-contended and computation-intensive workloads. We have released the open-source codes in GitHub for public use.

2023-06-19

International Workshop on Quality of Service (published)

doi.org

Causal Discovery with Language Models as Imperfect Experts

Stephanie Long

Alexandre Piché

Valentina Zantedeschi

Tibor Schuster

Alexandre Drouin

Understanding the causal relationships that underlie a system is a fundamental prerequisite to accurate decision-making. In this work, we ex… (voir plus)plore how expert knowledge can be used to improve the data-driven identification of causal graphs, beyond Markov equivalence classes. In doing so, we consider a setting where we can query an expert about the orientation of causal relationships between variables, but where the expert may provide erroneous information. We propose strategies for amending such expert knowledge based on consistency properties, e.g., acyclicity and conditional independencies in the equivalence class. We then report a case study, on real data, where a large language model is used as an imperfect expert.

2023-06-19

ICML.cc/2023/Workshop/SPIGM (poster)

doi.org

openreview.net

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications