GFlowNets and variational inference
Nikolay Malkin
Salem Lahlou
Tristan Deleu
Xu Ji
Edward J Hu
Katie E Everett
Dinghuai Zhang
This paper builds bridges between two families of probabilistic algorithms: (hierarchical) variational inference (VI), which is typically us… (see more)ed to model distributions over continuous spaces, and generative flow networks (GFlowNets), which have been used for distributions over discrete structures such as graphs. We demonstrate that, in certain cases, VI algorithms are equivalent to special cases of GFlowNets in the sense of equality of expected gradients of their learning objectives. We then point out the differences between the two families and show how these differences emerge experimentally. Notably, GFlowNets, which borrow ideas from reinforcement learning, are more amenable than VI to off-policy training without the cost of high gradient variance induced by importance sampling. We argue that this property of GFlowNets can provide advantages for capturing diversity in multimodal target distributions.
GFlowNets for AI-Driven Scientific Discovery
Moksh J. Jain
Tristan Deleu
Jason Hartford
Cheng-Hao Liu
Alex Hernandez-Garcia
Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the p… (see more)ace of scientific discovery. While science has traditionally relied...
How gradient estimator variance and bias impact learning in neural networks
Arna Ghosh
Yuhan Helena Liu
Konrad Paul Kording
There is growing interest in understanding how real brains may approximate gradients and how gradients can be used to train neuromorphic chi… (see more)ps. However, neither real brains nor neuromorphic chips can perfectly follow the loss gradient, so parameter updates would necessarily use gradient estimators that have some variance and/or bias. Therefore, there is a need to understand better how variance and bias in gradient estimators impact learning dependent on network and task properties. Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. We find that the ideal amount of variance and bias in a gradient estimator are dependent on several properties of the network and task: the size and activity sparsity of the network, the norm of the gradient, and the curvature of the loss landscape. As such, whether considering biologically-plausible learning algorithms or algorithms for training neuromorphic chips, researchers can analyze these properties to determine whether their approximation to gradient descent will be effective for learning given their network and task properties.
ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations
Badr Youbi Idrissi
Diane Bouchacourt
Randall Balestriero
Ivan Evtimov
Caner Hazirbas
Nicolas Ballas
Michal Drozdzal
David Lopez-Paz
Mark Ibrahim
Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fa… (see more)il to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X—a set of sixteen human annotations of factors such as pose, background, or lighting the entire ImageNet-1k validation set as well as a random subset of 12k training images. Equipped with ImageNet-X, we investigate 2,200 current recognition models and study the types of mistakes as a function of model’s (1) architecture, e.g. transformer vs. convolutional, (2) learning paradigm, e.g. supervised vs. self-supervised, and (3) training procedures, e.g., data augmentation. Regardless of these choices, we find models have consistent failure modes across ImageNet-X categories. We also find that while data augmentation can improve robustness to certain factors, they induce spill-over effects to other factors. For example, color-jitter augmentation improves robustness to color and brightness, but surprisingly hurts robustness to pose. Together, these insights suggest to advance the robustness of modern vision models, future research should focus on collecting additional data and understanding data augmentation schemes. Along with these insights, we release a toolkit based on ImageNet-X to spur further study into the mistakes image recognition systems make.
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Latent Bottlenecked Attentive Neural Processes
Leo Feng
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Neural Processes (NPs) are popular methods in meta-learning that can estimate predictive uncertainty on target datapoints by conditioning on… (see more) a context dataset. Previous state-of-the-art method Transformer Neural Processes (TNPs) achieve strong performance but require quadratic computation with respect to the number of context datapoints, significantly limiting its scalability. Conversely, existing sub-quadratic NP variants perform significantly worse than that of TNPs. Tackling this issue, we propose Latent Bottlenecked Attentive Neural Processes (LBANPs), a new computationally efficient sub-quadratic NP variant, that has a querying computational complexity independent of the number of context datapoints. The model encodes the context dataset into a constant number of latent vectors on which self-attention is performed. When making predictions, the model retrieves higher-order information from the context dataset via multiple cross-attention mechanisms on the latent vectors. We empirically show that LBANPs achieve results competitive with the state-of-the-art on meta-regression, image completion, and contextual multi-armed bandits. We demonstrate that LBANPs can trade-off the computational cost and performance according to the number of latent vectors. Finally, we show LBANPs can scale beyond existing attention-based NP variants to larger dataset settings.
Latent State Marginalization as a Low-cost Approach for Improving Exploration
Dinghuai Zhang
Qinqing Zheng
Amy Zhang
Ricky T. Q. Chen
Learning From FM Communications: Toward Accurate, Efficient, All-Terrain Vehicle Localization
X. Chen
Qiao Xiang
L. Kong
Huisan Xu
Xuemei Liu
Vehicle localization service is a fundamental component of intelligent transportation systems. The widely used satellite navigation systems … (see more)perform poorly in urban areas because the lines of sight to satellites are blocked by complex terrain characteristics, e.g., buildings, elevated streets and interchanges. In this paper, we design RadioLoc, a novel system achieving accurate, efficient, all-terrain vehicle localization with two key design points. First, RadioLoc harvests the frequency modulation (FM) signal, which has higher availability than satellite signal in complex terrains, as the signal source for localization. Second, RadioLoc integrates modern machine learning techniques into the processing of FM signals to efficiently learn the accurate vehicle localization in all-terrain environments. We validate the feasibility of FM-based vehicle localization and corresponding challenges and practical issues via field tests (e.g., signal distortion, signal inconsistency and limited in- vehicle radio bandwidth), and develop a series of advanced techniques in RadioLoc to address them, including adaptive batching, frequency sweeping, a novel multipath delay spread filter, a reconstructive PCA denoiser and a tailored FM feature extractor. We then develop a generic, modular localization module in RadioLoc, and design different learning-based 3D position identification algorithms for this module. We implement a prototype of RadioLoc and perform extensive field experiments to evaluate its efficiency and efficacy. Results show that (1) RadioLoc achieves a real-time localization latency of less than 100 milliseconds; (2) RadioLoc achieves a worst-case localization accuracy of 99.6% even in an underground parking lot, and (3) the horizontal error of RadioLoc is only one sixth of a dedicated GPS device even when the vehicle is moving at a high-speed (i.e., 80 km/h) in a complex highway scenario.
Learning From FM Communications: Toward Accurate, Efficient, All-Terrain Vehicle Localization
X. T. Chen
Qiao Xiang
Linghe Kong
Huisan Xu
Vehicle localization service is a fundamental component of intelligent transportation systems. The widely used satellite navigation systems … (see more)perform poorly in urban areas because the lines of sight to satellites are blocked by complex terrain characteristics, e.g., buildings, elevated streets and interchanges. In this paper, we design RadioLoc, a novel system achieving accurate, efficient, all-terrain vehicle localization with two key design points. First, RadioLoc harvests the frequency modulation (FM) signal, which has higher availability than satellite signal in complex terrains, as the signal source for localization. Second, RadioLoc integrates modern machine learning techniques into the processing of FM signals to efficiently learn the accurate vehicle localization in all-terrain environments. We validate the feasibility of FM-based vehicle localization and corresponding challenges and practical issues via field tests (e.g., signal distortion, signal inconsistency and limited in- vehicle radio bandwidth), and develop a series of advanced techniques in RadioLoc to address them, including adaptive batching, frequency sweeping, a novel multipath delay spread filter, a reconstructive PCA denoiser and a tailored FM feature extractor. We then develop a generic, modular localization module in RadioLoc, and design different learning-based 3D position identification algorithms for this module. We implement a prototype of RadioLoc and perform extensive field experiments to evaluate its efficiency and efficacy. Results show that (1) RadioLoc achieves a real-time localization latency of less than 100 milliseconds; (2) RadioLoc achieves a worst-case localization accuracy of 99.6% even in an underground parking lot, and (3) the horizontal error of RadioLoc is only one sixth of a dedicated GPS device even when the vehicle is moving at a high-speed (i.e., 80 km/h) in a complex highway scenario.