Publications

Harnessing small projectors and multiple views for efficient vision pretraining

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular Retrieval

Philip Fradkin

Puria Azadi Moghadam

Karush Suri

Frederik Wenkel

Ali Bashashati

Maciej Sypetkowski

Dominique Beaini

Predicting molecular impact on cellular function is a core challenge in therapeutic design. Phenomic experiments, designed to capture cellul… (voir plus)ar morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell. In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning. Specifically, we study the problem ofContrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments. We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration. We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration. Following this recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds. In particular, we demonstrate an 8.1x improvement in zero shot molecular retrieval of active molecules over the previous state-of-the-art, reaching 77.33% in top-1% accuracy. These results open the door for machine learning to be applied in virtual phenomics screening, which can significantly benefit drug discovery applications.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

On improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models

Tariq Berrada

Pietro Astolfi

Melissa Hall

Reyhane Askari Hemmat

Yohann Benchetrit

Marton Havasi

Matthew J. Muckley

Karteek Alahari

Adriana Romero Soriano

Jakob Verbeek

Michal Drozdzal

Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, large-scale end-to-e… (voir plus)nd training of these models is computationally costly, and hence most research focuses either on finetuning pretrained models or experiments at smaller scales. In this work we aim to improve the training efficiency and performance of LDMs with the goal of scaling to larger datasets and higher resolutions. We focus our study on two points that are critical for good performance and efficient training: (i) the mechanisms used for semantic level (\eg a text prompt, or class name) and low-level (crop size, random flip, \etc) conditioning of the model, and (ii) pre-training strategies to transfer representations learned on smaller and lower-resolution datasets to larger ones. The main contributions of our work are the following: we present systematic experimental study of these points, we propose a novel conditioning mechanism that disentangles semantic and low-level conditioning, we obtain state-of-the-art performance on CC12M for text-to-image at 512 resolution.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Improved off-policy training of diffusion samplers

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We ben… (voir plus)chmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at [this link](https://github.com/GFNOrg/gfn-diffusion) as a base for future work on diffusion models for amortized inference.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Improving Context-Aware Preference Modeling for Language Models

Silviu Pitis

Ziang Xiao

Nicolas Le Roux

Alessandro Sordoni

While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language pr… (voir plus)esents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. Unlike past datasets, where context-specific preference is highly correlated with general preference, our "preference reversal" datasets disentangle context-specific and general preferences to isolate context-specific capabilities. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B, and (3) investigate the potential value of context-aware preference modeling.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

Hongyao Tang

Glen Berseth

Deep neural networks provide Reinforcement Learning (RL) powerful function approximators to address large-scale decision-making problems. Ho… (voir plus)wever, these approximators introduce challenges due to the non-stationary nature of RL training. One source of the challenges in RL is that output predictions can churn, leading to uncontrolled changes after each batch update for states not included in the batch. Although such a churn phenomenon exists in each step of network training, how churn occurs and impacts RL remains under-explored. In this work, we start by characterizing churn in a view of Generalized Policy Iteration with function approximation, and we discover a chain effect of churn that leads to a cycle where the churns in value estimation and policy improvement compound and bias the learning dynamics throughout the iteration. Further, we concretize the study and focus on the learning issues caused by the chain effect in different settings, including greedy action deviation in value-based methods, trust region violation in proximal policy optimization, and dual bias of policy value in actor-critic methods. We then propose a method to reduce the chain effect across different settings, called Churn Approximated ReductIoN (CHAIN), which can be easily plugged into most existing DRL algorithms. Our experiments demonstrate the effectiveness of our method in both reducing churn and improving learning performance across online and offline, value-based and policy-based RL settings, as well as a scaling setting.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Interpreting Learned Feedback Patterns in Large Language Models

Luke Marks

Amir Abdullah

Clement Neo

Rauno Arike

David Scott Krueger

Philip Torr

Fazl Barez

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Learning Successor Features the Simple Way

Christos Kaplanis

In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference … (voir plus)in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Listenable Maps for Zero-Shot Audio Classifiers

Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthines… (voir plus)s of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-Shot context), which, to the best of our knowledge, is the first decoder-based post-hoc interpretation method for explaining the decisions of zero-shot audio classifiers. The proposed method utilizes a novel loss function that maximizes the faithfulness to the original similarity between a given text-and-audio pair. We provide an extensive evaluation using the Contrastive Language-Audio Pretraining (CLAP) model to showcase that our interpreter remains faithful to the decisions in a zero-shot classification context. Moreover, we qualitatively show that our method produces meaningful explanations that correlate well with different text prompts.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Do LLMs Build World Representations? Probing Through the Lens of State Abstraction

Zichao Li

Yanshuai Cao

Jackie Cheung

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Many-Shot In-Context Learning

Rishabh Agarwal

Avi Singh

Lei M Zhang

Bernd Bohnet

Stephanie C.Y. Chan

Luis Rosias

Biao Zhang

Ankesh Anand

Zaheer Abbas

Azade Nova

John D Co-Reyes

Eric Chu

Feryal Behbahani

Aleksandra Faust

Hugo Larochelle

Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, w… (voir plus)ithout any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples – the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated outputs. To mitigate this limitation, we explore two new settings: (1) "Reinforced ICL" that uses model-generated chain-of-thought rationales in place of human rationales, and (2) "Unsupervised ICL" where we remove rationales from the prompt altogether, and prompts the model only with domain-specific inputs. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. We demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to supervised fine-tuning. Finally, we reveal the limitations of next-token prediction loss as an indicator of downstream ICL performance.

2024-09-25

NeurIPS.cc/2024/Conference (spotlight)

doi.org

openreview.net

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Rajiv Didolkar

Anirudh Goyal

Nan Rosemary Ke

Siyuan Guo

Michal Valko

Timothy P Lillicrap

Danilo Jimenez Rezende