Publications

CeBed: A Benchmark for Deep Data-Driven OFDM Channel Estimation
Amal Feriani
Di Wu
Steve Liu
Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design
Julien Roy
Emmanuel Bengio
In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound f… (see more)or pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front.
Using modular connectome-based predictive modeling to reveal brain-behavior relationships of individual differences in working memory
Huayi Yang
Junjun Zhang
Zhenlan Jin
Ling Li
Accelerating exploration and representation learning with offline pre-training
Bogdan Mazoure
Jake Bruce
Rob Fergus
Ankit Anand
Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement lea… (see more)rning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.
Accelerating Generalized Random Forests with Fixed-Point Trees
David L. Fleischer
David A. Stephens
Cognitive Models as Simulators: Using Cognitive Models to Tap into Implicit Human Feedback
Ardavan S. Nobandegani
Thomas Shultz
Constant Memory Attention Block
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats Leon Richter
Quentin Gregory Anthony
Timothee LESORT
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes a… (see more)vailable. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch
Questions Are All You Need to Train a Dense Passage Retriever
Devendra Singh Sachan
Mike Lewis
Dani Yogatama
Luke Zettlemoyer
Manzil Zaheer
We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training da… (see more)ta. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g., questions and potential answer passages). It uses a new passage-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence passages, and (2) the passages are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both passage and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.1 Our code and model checkpoints are available at: https://github.com/DevSinghSachan/art.
ROSA: Random Orthogonal Subspace Adaptation
Marawan Gamal
Aristides Milios
Towards Out-of-Distribution Adversarial Robustness
Adam Ibrahim
Charles Guille-Escuret
Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fail… (see more)s to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different
BatchGFN: Generative Flow Networks for Batch Active Learning
Shreshth A Malik
Salem Lahlou
Andrew Jesson
Moksh J. Jain
Nikolay Malkin
Tristan Deleu
Yarin Gal
We introduce BatchGFN—a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points pro… (see more)portional to a batch reward. With an appropriate reward function to quantify the utility of acquiring a batch, such as the joint mutual information between the batch and the model parameters, BatchGFN is able to construct highly informative batches for active learning in a principled way. We show our approach enables sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems. This alleviates the computational complexity of batch-aware algorithms and removes the need for greedy approximations to find maximizers for the batch reward. We also present early results for amortizing training across acquisition steps, which will enable scaling to real-world tasks.