Portrait of Kirill  Neklyudov

Kirill Neklyudov

Core Academic Member
Assistant Professor, Université de Montréal, Mathematics and Statistics
Research Topics
Deep Learning
Dynamical Systems
Generative Models
Molecular Modeling
Probabilistic Models

Current Students

Independent visiting researcher - University of British Columbia
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Independent visiting researcher - Helmholtz Zentrum München
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Independent visiting researcher - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - University of Oxford
PhD - Polytechnique Montréal
Co-supervisor :
PhD - Université de Montréal

Publications

Discrete Feynman-Kac Correctors
Viktor Ohanesian
Artem Gazizov
Alán Aspuru-Guzik
Roberto Bondesan
Kirill Neklyudov
Discrete diffusion models have recently emerged as a promising alternative to the autoregressive approach for generating discrete sequences.… (see more) Sample generation via gradual denoising or demasking processes allows them to capture hierarchical non-sequential interdependencies in the data. These custom processes, however, do not assume a flexible control over the distribution of generated samples. We propose Discrete Feynman-Kac Correctors, a framework that allows for controlling the generated distribution of discrete masked diffusion models at inference time. We derive Sequential Monte Carlo (SMC) algorithms that, given a trained discrete diffusion model, control the temperature of the sampled distribution (i.e. perform annealing), sample from the product of marginals of several diffusion processes (e.g. differently conditioned processes), and sample from the product of the marginal with an external reward function, producing likely samples from the target distribution that also have high reward. Notably, our framework does not require any training of additional models or fine-tuning of the original model. We illustrate the utility of our framework in several applications including: efficient sampling from the annealed Boltzmann distribution of the Ising model, improving the performance of language models for code generation and amortized learning, as well as reward-tilted protein sequence generation.
Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction
Vincent Pauline
Kirill Neklyudov
Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models
David Layden
Ryan Sweke
Vojtvech Havl'ivcek
Anirban Chowdhury
Kirill Neklyudov
Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts
Viktor Ohanesian
Roberto Bondesan
Alán Aspuru-Guzik
Arnaud Doucet
Rob Brekelmans
Kirill Neklyudov
While score-based generative models are the model of choice across diverse domains, there are limited tools available for controlling infere… (see more)nce-time behavior in a principled manner, e.g. for composing multiple pretrained models. Existing classifier-free guidance methods use a simple heuristic to mix conditional and unconditional scores to approximately sample from conditional distributions. However, such methods do not approximate the intermediate distributions, necessitating additional `corrector' steps. In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. We derive a weighted simulation scheme which we call Feynman-Kac Correctors (FKCs) based on the celebrated Feynman-Kac formula by carefully accounting for terms in the appropriate partial differential equations (PDEs). To simulate these PDEs, we propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality. We empirically demonstrate the utility of our methods by proposing amortized sampling via inference-time temperature annealing, improving multi-objective molecule generation using pretrained models, and improving classifier-free guidance for text-to-image generation. Our code is available at https://github.com/martaskrt/fkc-diffusion.
Amortized Sampling with Transferable Normalizing Flows
Charlie B. Tan
Leon Klein
Saifuddin Syed
Michael M. Bronstein
Kirill Neklyudov
Efficient equilibrium sampling of molecular conformations remains a core challenge in computational chemistry and statistical inference. Cla… (see more)ssical approaches such as molecular dynamics or Markov chain Monte Carlo inherently lack amortization; the computational cost of sampling must be paid in full for each system of interest. The widespread success of generative models has inspired interest towards overcoming this limitation through learning sampling algorithms. Despite performing competitively with conventional methods when trained on a single system, learned samplers have so far demonstrated limited ability to transfer across systems. We demonstrate that deep learning enables the design of scalable and transferable samplers by introducing Prose, a 285 million parameter all-atom transferable normalizing flow trained on a corpus of peptide molecular dynamics trajectories up to 8 residues in length. Prose draws zero-shot uncorrelated proposal samples for arbitrary peptide systems, achieving the previously intractable transferability across sequence length, whilst retaining the efficient likelihood evaluation of normalizing flows. Through extensive empirical evaluation we demonstrate the efficacy of Prose as a proposal for a variety of sampling algorithms, finding a simple importance sampling-based finetuning procedure to achieve competitive performance to established methods such as sequential Monte Carlo. We open-source the Prose codebase, model weights, and training dataset, to further stimulate research into amortized sampling methods and finetuning objectives.
Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities
Avishek Joey Bose
Valentin De Bortoli
Arnaud Doucet
Michael M. Bronstein
Kirill Neklyudov
Sampling efficiently from a target unnormalized probability density remains a core challenge, with relevance across countless high-impact sc… (see more)ientific applications. A promising approach towards this challenge is the design of amortized samplers that borrow key ideas, such as probability path design, from state-of-the-art generative diffusion models. However, all existing diffusion-based samplers remain unable to draw samples from distributions at the scale of even simple molecular systems. In this paper, we propose Progressive Inference-Time Annealing (PITA), a novel framework to learn diffusion-based samplers that combines two complementary interpolation techniques: I.) Annealing of the Boltzmann distribution and II.) Diffusion smoothing. PITA trains a sequence of diffusion models from high to low temperatures by sequentially training each model at progressively higher temperatures, leveraging engineered easy access to samples of the temperature-annealed target density. In the subsequent step, PITA enables simulating the trained diffusion model to procure training samples at a lower temperature for the next diffusion model through inference-time annealing using a novel Feynman-Kac PDE combined with Sequential Monte Carlo. Empirically, PITA enables, for the first time, equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates with dramatically lower energy function evaluations. Code available at: https://github.com/taraak/pita
Discrete Feynman-Kac Correctors
Viktor Ohanesian
Artem Gazizov
Alán Aspuru-Guzik
Roberto Bondesan
Kirill Neklyudov
Discrete diffusion models have recently emerged as a promising alternative to the autoregressive approach for generating discrete sequences.… (see more) Sample generation via gradual denoising or demasking processes allows them to capture hierarchical non-sequential interdependencies in the data. These custom processes, however, do not assume a flexible control over the distribution of generated samples. We propose Discrete Feynman-Kac Correctors, a framework that allows for controlling the generated distribution of discrete masked diffusion models at inference time. We derive Sequential Monte Carlo (SMC) algorithms that, given a trained discrete diffusion model, control the temperature of the sampled distribution (i.e. perform annealing), sample from the product of marginals of several diffusion processes (e.g. differently conditioned processes), and sample from the product of the marginal with an external reward function, producing likely samples from the target distribution that also have high reward. Notably, our framework does not require any training of additional models or fine-tuning of the original model. We illustrate the utility of our framework in several applications including: efficient sampling from the annealed Boltzmann distribution of the Ising model, improving the performance of language models for code generation and amortized learning, as well as reward-tilted protein sequence generation.
Self-Refining Training for Amortized Density Functional Theory
Cristian Gabellini
Hatem Helal
Kirill Neklyudov
Density Functional Theory (DFT) allows for predicting all the chemical and physical properties of molecular systems from first principles by… (see more) finding an approximate solution to the many-body Schrödinger equation. However, the cost of these predictions becomes infeasible when increasing the scale of the energy evaluations, e.g., when calculating the ground-state energy for simulating molecular dynamics. Recent works have demonstrated that, for substantially large datasets of molecular conformations, Deep Learning-based models can predict the outputs of the classical DFT solvers by amortizing the corresponding optimization problems. In this paper, we propose a novel method that reduces the dependency of amortized DFT solvers on large pre-collected datasets by introducing a self-refining training strategy. Namely, we propose an efficient method that simultaneously trains a deep-learning model to predict the DFT outputs and samples molecular conformations that are used as training data for the model. We derive our method as a minimization of the variational upper bound on the KL-divergence measuring the discrepancy between the generated samples and the target Boltzmann distribution defined by the ground state energy. To demonstrate the utility of the proposed scheme, we perform an extensive empirical study comparing it with the models trained on the pre-collected datasets. Finally, we open-source our implementation of the proposed algorithm, optimized with asynchronous training and sampling stages, which enables simultaneous sampling and training. Code is available at https://github.com/majhas/self-refining-dft.
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Xi Zhang
Brandon Amos
Leo J. Lee
Kirill Neklyudov
Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynam… (see more)ics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depend on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrate along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions, unlike previously proposed methods. We demonstrate the ability of MFM to improve the prediction of individual treatment responses on a large-scale multi-patient single-cell drug screen dataset.
The Superposition of Diffusion Models Using the Itô Density Estimator
Avishek Joey Bose
Kirill Neklyudov
The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-t… (see more)rained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemming from the celebrated continuity equation and design two novel algorithms tailor-made for combining diffusion models in SuperDiff. SuperDiff leverages a new scalable Itô density estimator for the log likelihood of the diffusion SDE which incurs no additional overhead compared to the well-known Hutchinson's estimator needed for divergence calculations. We demonstrate that SuperDiff is scalable to large pre-trained diffusion models as superposition is performed solely through composition during inference, and also enjoys painless implementation as it combines different pre-trained vector fields through an automated re-weighting scheme. Notably, we show that SuperDiff is efficient during inference time, and mimics traditional composition operators such as the logical OR and the logical AND. We empirically demonstrate the utility of using SuperDiff for generating more diverse images on CIFAR-10, more faithful prompt conditioned image editing using Stable Diffusion, as well as improved conditional molecule generation and unconditional de novo structure design of proteins. https://github.com/necludov/super-diffusion
Scaling Deep Learning Solutions for Transition Path Sampling
Michael Plainer
Yuanqi Du
Rob Brekelmans
Carla P Gomes
Kirill Neklyudov
Transition path sampling (TPS) is an important method for studying rare events, such as they happen in chemical reactions or protein folding… (see more). These events occur so infrequently that traditional simulations are often impractical, and even recent machine-learning approaches struggle to address this issue for larger systems. In this paper, we propose using modern deep learning techniques to improve the scalability of TPS methods significantly. We highlight the need for better evaluations in the existing literature and start by formulating TPS as a sampling problem over an unnormalized target density and introduce relevant evaluation metrics to assess the effectiveness of TPS solutions from this perspective. To develop a scalable approach, we explore several design choices, including a problem-informed neural network architecture, simulated annealing, the integration of prior knowledge into the sampling process, and attention mechanisms. Finally, we conduct a comprehensive empirical study and compare these design choices with other recently developed deep-learning methods for rare event sampling.
Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints
Lingkai Kong
Yuanqi Du
Wenhao Mu
Kirill Neklyudov
Valentin De Bortoli
Haorui Wang
Dongxia Wu
Aaron Ferber
Yi-An Ma
Carla P. Gomes
Chao Zhang
Addressing real-world optimization problems becomes particularly challenging when analytic objective functions or constraints are unavailabl… (see more)e. While numerous studies have addressed the issue of unknown objectives, limited research has focused on scenarios where feasibility constraints are not given explicitly. Overlooking these constraints can lead to spurious solutions that are unrealistic in practice. To deal with such unknown constraints, we propose to perform optimization within the data manifold using diffusion models. To constrain the optimization process to the data manifold, we reformulate the original optimization problem as a sampling problem from the product of the Boltzmann distribution defined by the objective function and the data distribution learned by the diffusion model. Depending on the differentiability of the objective function, we propose two different sampling methods. For differentiable objectives, we propose a two-stage framework that begins with a guided diffusion process for warm-up, followed by a Langevin dynamics stage for further correction. For non-differentiable objectives, we propose an iterative importance sampling strategy using the diffusion model as the proposal distribution. Comprehensive experiments on a synthetic dataset, six real-world black-box optimization datasets, and a multi-objective molecule optimization dataset show that our method achieves better or comparable performance with previous state-of-the-art baselines.