Publications

Language Model Alignment with Elastic Reset

Samuel Lavoie

Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimiz… (see more)ing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how different algorithms tradeoff between reward and drift. The standard method modified the reward with a Kullback-Lieber (KL) penalty between the online and initial model. We propose Elastic Reset, a new algorithm that achieves higher reward with less drift without explicitly modifying the training objective. We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model. Through the use of an EMA, our model recovers quickly after resets and achieves higher reward with less drift in the same number of steps. We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLaMA-7B. Code available at github.com/mnoukhov/elastic-reset.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Stefano Massaroli

Michael Poli

Daniel Y Fu

Hermann Kumbong

Rom Nishijima Parnichkun

Aman Timalsina

David W. Romero

Quinn McIntyre

Beidi Chen

Atri Rudra

Ce Zhang

Christopher Re

Stefano Ermon

Yoshua Bengio

Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers… (see more). In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input sequence for each generated token -- similarly to attention-based models. In this paper, we seek to enable

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Learning better with Dale's Law: A Spectral Perspective

Most recurrent neural networks (RNNs) do not include a fundamental constraint of real neural circuits: Dale’s Law, which implies that neur… (see more)ons must be excitatory (E) or inhibitory (I). Dale’s Law is generally absent from RNNs because simply partitioning a standard network’s units into E and I populations impairs learning. However, here we extend a recent feedforward bio-inspired EI network architecture, named Dale’s ANNs, to recurrent networks, and demonstrate that good performance is possible while respecting Dale’s Law. This begs the question: What makes some forms of EI network learn poorly and others learn well? And, why does the simple approach of incorporating Dale’s Law impair learning? Historically the answer was thought to be the sign constraints on EI network parameters, and this was a motivation behind Dale’s ANNs. However, here we show the spectral properties of the recurrent weight matrix at initialisation are more impactful on network performance than sign constraints. We find that simple EI partitioning results in a singular value distribution that is multimodal and dispersed, whereas standard RNNs have an unimodal, more clustered singular value distribution, as do recurrent Dale’s ANNs. We also show that the spectral properties and performance of partitioned EI networks are worse for small networks with fewer I units, and we present normalised SVD entropy as a measure of spectrum pathology that correlates with performance. Overall, this work sheds light on a long-standing mystery in neuroscience-inspired AI and computational neuroscience, paving the way for greater alignment between neural networks and biology.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Learning Reliable Logical Rules with SATNet

Zhaoyu Li

Jinpei Guo

Yuhe Jiang

Xujie Si

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets

Hanjun Dai

2023-09-20

NeurIPS.cc/2023/Conference (spotlight)

openreview.net

Lie Point Symmetry and Physics Informed Networks

Tara Akhound-Sadegh

Laurence Perreault-Levasseur

Johannes Brandstetter

MAX WELLING

Siamak Ravanbakhsh

Symmetries have been leveraged to improve the generalization of neural networks through different mechanisms from data augmentation to equiv… (see more)ariant architectures. However, despite their potential, their integration into neural solvers for partial differential equations (PDEs) remains largely unexplored. We explore the integration of PDE symmetries, known as Lie point symmetries, in a major family of neural solvers known as physics-informed neural networks (PINNs). We propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE through a loss function. Intuitively, our symmetry loss ensures that the infinitesimal generators of the Lie group conserve the PDE solutions. Effectively, this means that once the network learns a solution, it also learns the neighbouring solutions generated by Lie point symmetries. Empirical evaluations indicate that the inductive bias introduced by the Lie point symmetries of the PDEs greatly boosts the sample efficiency of PINNs.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Maximum State Entropy Exploration using Predecessor and Successor Representations

Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misp… (see more)laced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition only on the current state or simply rely on making random open-loop exploratory moves. In this work, we propose

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Multi-Head Adapter Routing for Cross-Task Generalization

Lucas Caccia

Edoardo Ponti

Zhan Su

Matheus Pereira

Nicolas Le Roux

Alessandro Sordoni

Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists in pre-training adapters on a multi-task training set before f… (see more)ew-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Neural Graph Generation from Graph Statistics.

Kiarash Zahirnia

Yaochen Hu

Mark J. Coates

Oliver Schulte

2023-09-20

NeurIPS.cc/2023/Conference (poster)

openreview.net

Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure

Angela Yuan

Chris Junchi Li

Gauthier Gidel

Michael Jordan

Quanquan Gu

Simon Shaolei Du

We consider the problem of solving stochastic monotone variational inequalities with a separable structure using a stochastic first-order or… (see more)acle. Building on standard extragradient for variational inequalities we propose a novel algorithm---stochastic \emph{accelerated gradient-extragradient} (AG-EG)---for strongly monotone variational inequalities (VIs). Our approach combines the strengths of extragradient and Nesterov acceleration. By showing that its iterates remain in a bounded domain and applying scheduled restarting, we prove that AG-EG has an optimal convergence rate for strongly monotone VIs. Furthermore, when specializing to the particular case of bilinearly coupled strongly-convex-strongly-concave saddle-point problems, including bilinear games, our algorithm achieves fine-grained convergence rates that match the respective lower bounds, with the stochasticity being characterized by an additive statistical error term that is optimal up to a constant prefactor.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

openreview.net

Parallel-mentoring for Offline Model-based Optimization

Can (Sam) Chen

Christopher Beckham

Zixuan Liu

Xue Liu

Christopher Pal

We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These desi… (see more)gns encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose \textit{parallel-mentoring} as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, \textit{voting-based pairwise supervision}, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an \textit{adaptive soft-labeling} module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Bellemare Marc-Emmanuel

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In th… (see more)is work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Mila on Udemy

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Publications

Mila on Udemy

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Popular keywords:

Publications