Nikolay Malkin

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

David Dobre

Juho Lee

Sung Ju Hwang

Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of lar… (voir plus)ge language models (LLMs). Developing effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that elicit undesirable responses from a target LLM, as measured, for example, by an auxiliary toxicity classifier. We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks. As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts. We find that the attacks generated by our method are effective against a wide range of target LLMs, both with and without safety tuning, and transfer well between target LLMs. Finally, we demonstrate that models safety-tuned using a dataset of red-teaming prompts generated by our method are robust to attacks from other RL-based red-teaming approaches.

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

PQMass: Probabilistic Assessment of the Quality of Generative Models Using Probability Mass Estimation

Pablo Lemos

Sammy Sharief

Nikolay Malkin

Salma Salhi

Laurence Perreault-Levasseur

Yashar Hezaveh

Connor Stone

We propose a likelihood-free method for comparing two distributions given samples from each, with the goal of assessing the quality of gener… (voir plus)ative models. The proposed approach, PQMass, provides a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models. PQMass divides the sample space into non-overlapping regions and applies chi-squared tests to the number of data samples that fall within each region, giving a

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

Path-filtering in path-integral simulations of open quantum systems using GFlowNets

Jeremy Lackman-Mincoff

An important class of methods for modeling dynamics in open quantum systems is based on the well-known influence functional (IF) approach to… (voir plus) solving path-integral equations of motion. Within this paradigm, path-filtering schemes based on the removal of IF elements that fall below a certain threshold aim to reduce the effort needed to calculate and store the influence functional, making very challenging simulations possible. A filtering protocol of this type is considered acceptable as long as the simulation remains mathematically stable. This, however, does not guarantee that the approximated dynamics preserve the physics of the simulated process. In this paper, we explore the possibility of training Generative Flow Networks (GFlowNets) to produce filtering protocols while optimizing for mathematical stability and for physical accuracy. Trained using the trajectory balance objective, the model produces sets of paths to be added to a truncated initial set; it is rewarded if the combined set of paths gives rise to solutions in which the trace of the density matrix is conserved, the populations remain real, and the dynamics approach the exact reference. Using a simple two-level system coupled to a dissipative reservoir, we perform proof-of-concept simulations and demonstrate the elegant and surprising filtering solutions proposed by the GFlowNet.

2024-10-07

Journal of Chemical Physics (publié)

doi.org

Amortizing intractable inference in diffusion models for vision, language, and control

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-09-24

Neural Information Processing Systems (poster)

doi.org

openreview.net

Improved off-policy training of diffusion samplers

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We ben… (voir plus)chmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.

2024-09-24

Neural Information Processing Systems (poster)

doi.org

openreview.net

V-STaR: Training Verifiers for Self-Taught Reasoners

Xingdi Yuan

Common self-improvement approaches for large language models (LLMs), such as STaR, iteratively fine-tune LLMs on self-generated solutions to… (voir plus) improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming, we propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier using DPO that judges correctness of model-generated solutions. This verifier is used at inference time to select one solution among many candidate solutions. Running V-STaR for multiple iterations results in progressively better reasoners and verifiers, delivering a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches on common code generation and math reasoning benchmarks with LLaMA2 models.

2024-07-09

colmweb.org/COLM/2024/Conference (accepté)

doi.org

openreview.net

Improving Gradient-Guided Nested Sampling for Posterior Inference

Will Handley

Laurence Perreault-Levasseur

We present a performant, general-purpose gradient-guided nested sampling algorithm, …

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

On Generalization for Generative Flow Networks

Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an u… (voir plus)nnormalized probability distribution, called the reward function. This framework learns a policy on a constructed graph, which enables sampling from an approximation of the target probability distribution through successive steps of sampling from the learned policy. To achieve this, GFlowNets can be trained with various objectives, each of which can lead to the model s ultimate goal. The aspirational strength of GFlowNets lies in their potential to discern intricate patterns within the reward function and their capacity to generalize effectively to novel, unseen parts of the reward function. This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function. The experiments will focus on length generalization meaning generalization to states that can be constructed only by longer trajectories than those seen in training.

2024-07-02

ArXiv (prépublication)

doi.org

arxiv.org

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Avishek Joey Bose

Cheng-Hao Liu

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (voir plus)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-04-30

ICML.cc/2024/Conference (poster)

doi.org

proceedings.mlr.press

Discrete Probabilistic Inference as Control in Multi-path Environments

We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to … (voir plus)find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.

2024-04-25

auai.org/UAI/2024/Conference (poster)

doi.org

proceedings.mlr.press

Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport

Yanlei Zhang

Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (voir plus)mulation-based maximum likelihood training. We introduce the generalized \textit{conditional flow matching} (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schrödinger bridge inference.

2024-03-10

TMLR (accepté)

openreview.net

Amortizing Intractable Inference in Large Language Models

Edward J. Hu

Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This l… (voir plus)imits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.

2024-01-15

ICLR.cc/2024/Conference (présentation orale)

doi.org

openreview.net

Mila Techaide 2026

Propulsion d'entrepreneurs scientifiques

Avantage IA : productivité dans la fonction publique

Nikolay Malkin

Billets de blogue

Publications

Mila Techaide 2026

Propulsion d'entrepreneurs scientifiques

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Nikolay Malkin

Billets de blogue

Publications