Publications

On improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models

Tariq Berrada

Pietro Astolfi

Melissa Hall

Reyhane Askari Hemmat

Yohann Benchetrit

Marton Havasi

Matthew J. Muckley

Karteek Alahari

Adriana Romero Soriano

Jakob Verbeek

Michal Drozdzal

Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, large-scale end-to-e… (see more)nd training of these models is computationally costly, and hence most research focuses either on finetuning pretrained models or experiments at smaller scales. In this work we aim to improve the training efficiency and performance of LDMs with the goal of scaling to larger datasets and higher resolutions. We focus our study on two points that are critical for good performance and efficient training: (i) the mechanisms used for semantic level (\eg a text prompt, or class name) and low-level (crop size, random flip, \etc) conditioning of the model, and (ii) pre-training strategies to transfer representations learned on smaller and lower-resolution datasets to larger ones. The main contributions of our work are the following: we present systematic experimental study of these points, we propose a novel conditioning mechanism that disentangles semantic and low-level conditioning, we obtain state-of-the-art performance on CC12M for text-to-image at 512 resolution.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Improved off-policy training of diffusion samplers

Marcin Sendera

Minsu Kim

Sarthak Mittal

Pablo Lemos

Luca Scimeca

Jarrid Rector-Brooks

Alexandre Adam

Yoshua Bengio

Nikolay Malkin

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We ben… (see more)chmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at [this link](https://github.com/GFNOrg/gfn-diffusion) as a base for future work on diffusion models for amortized inference.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Improving Context-Aware Preference Modeling for Language Models

Silviu Pitis

Ziang Xiao

Nicolas Le Roux

Alessandro Sordoni

While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language pr… (see more)esents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. Unlike past datasets, where context-specific preference is highly correlated with general preference, our "preference reversal" datasets disentangle context-specific and general preferences to isolate context-specific capabilities. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B, and (3) investigate the potential value of context-aware preference modeling.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

Hongyao Tang

Glen Berseth

Deep neural networks provide Reinforcement Learning (RL) powerful function approximators to address large-scale decision-making problems. Ho… (see more)wever, these approximators introduce challenges due to the non-stationary nature of RL training. One source of the challenges in RL is that output predictions can churn, leading to uncontrolled changes after each batch update for states not included in the batch. Although such a churn phenomenon exists in each step of network training, how churn occurs and impacts RL remains under-explored. In this work, we start by characterizing churn in a view of Generalized Policy Iteration with function approximation, and we discover a chain effect of churn that leads to a cycle where the churns in value estimation and policy improvement compound and bias the learning dynamics throughout the iteration. Further, we concretize the study and focus on the learning issues caused by the chain effect in different settings, including greedy action deviation in value-based methods, trust region violation in proximal policy optimization, and dual bias of policy value in actor-critic methods. We then propose a method to reduce the chain effect across different settings, called Churn Approximated ReductIoN (CHAIN), which can be easily plugged into most existing DRL algorithms. Our experiments demonstrate the effectiveness of our method in both reducing churn and improving learning performance across online and offline, value-based and policy-based RL settings, as well as a scaling setting.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Interpreting Learned Feedback Patterns in Large Language Models

Luke Marks

Amir Abdullah

Clement Neo

Rauno Arike

David Scott Krueger

Philip Torr

Fazl Barez

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Learning Successor Features the Simple Way

Raymond Chua

Arna Ghosh

Christos Kaplanis

Blake Richards

Doina Precup

In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference … (see more)in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Listenable Maps for Zero-Shot Audio Classifiers

Francesco Paissan

Luca Della Libera

Mirco Ravanelli

Cem Subakan

Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthines… (see more)s of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-Shot context), which, to the best of our knowledge, is the first decoder-based post-hoc interpretation method for explaining the decisions of zero-shot audio classifiers. The proposed method utilizes a novel loss function that maximizes the faithfulness to the original similarity between a given text-and-audio pair. We provide an extensive evaluation using the Contrastive Language-Audio Pretraining (CLAP) model to showcase that our interpreter remains faithful to the decisions in a zero-shot classification context. Moreover, we qualitatively show that our method produces meaningful explanations that correlate well with different text prompts.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Do LLMs Build World Representations? Probing Through the Lens of State Abstraction

Zichao Li

Yanshuai Cao

Jackie Cheung

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Many-Shot In-Context Learning

Rishabh Agarwal

Avi Singh

Lei M Zhang

Bernd Bohnet

Stephanie C.Y. Chan

Luis Rosias

Biao Zhang

Ankesh Anand

Zaheer Abbas

Azade Nova

John D Co-Reyes

Eric Chu

Feryal Behbahani

Aleksandra Faust

Hugo Larochelle

Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, w… (see more)ithout any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples – the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated outputs. To mitigate this limitation, we explore two new settings: (1) "Reinforced ICL" that uses model-generated chain-of-thought rationales in place of human rationales, and (2) "Unsupervised ICL" where we remove rationales from the prompt altogether, and prompts the model only with domain-specific inputs. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. We demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to supervised fine-tuning. Finally, we reveal the limitations of next-token prediction loss as an indicator of downstream ICL performance.

2024-09-25

NeurIPS.cc/2024/Conference (spotlight)

doi.org

openreview.net

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Rajiv Didolkar

Anirudh Goyal

Nan Rosemary Ke

Siyuan Guo

Michal Valko

Timothy P Lillicrap

Danilo Jimenez Rezende

Yoshua Bengio

Michael Curtis Mozer

Sanjeev Arora

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Multi-Scale Representation Learning for Protein Fitness Prediction

Zuobai Zhang

Pascal Notin

Yining Huang

Aurelie Lozano

Vijil Chenthamarakshan

Debora Susan Marks

Payel Das

Jian Tang

Designing novel functional proteins crucially depends on accurately modeling their fitness landscape. Given the limited availability of func… (see more)tional annotations from wet-lab experiments, previous methods have primarily relied on self-supervised models trained on vast, unlabeled protein sequence or structure datasets. While initial protein representation learning studies solely focused on either sequence or structural features, recent hybrid architectures have sought to merge these modalities to harness their respective strengths. However, these sequence-structure models have so far achieved only incremental improvements when compared to the leading sequence-only approaches, highlighting unresolved challenges effectively leveraging these modalities together. Moreover, the function of certain proteins is highly dependent on the granular aspects of their surface topology, which have been overlooked by prior models. To address these limitations, we introduce the Sequence-Structure-Surface Fitness (**S3F**) model — a novel multimodal representation learning framework that integrates protein features across several scales. Our approach combines sequence representations from a protein language model with Geometric Vector Perceptron networks encoding protein backbone and detailed surface topology. The proposed method achieves state-of-the-art fitness prediction on the ProteinGym benchmark encompassing 217 substitution deep mutational scanning assays, and provides insights into the determinants of protein function. Our code is at https://github.com/DeepGraphLearning/S3F.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Normalization and effective learning rates in reinforcement learning

Clare Lyle

Zeyu Zheng

Khimya Khetarpal

James Martens

Hado van Hasselt

Razvan Pascanu

Will Dabney

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Publications

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Popular keywords:

Publications