We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
The Generative Flow Network is a probabilistic framework where an agent learns a stochastic policy for object generation, such that the prob… (see more)ability of generating an object is proportional to a given reward function. Its effectiveness has been shown in discovering high-quality and diverse solutions, compared to reward-maximizing reinforcement learning-based methods. Nonetheless, GFlowNets only learn from rewards of the terminal states, which can limit its applicability. Indeed, intermediate rewards play a critical role in learning, for example from intrinsic motivation to provide intermediate feedback even in particularly challenging sparse reward tasks. Inspired by this, we propose Generative Augmented Flow Networks (GAFlowNets), a novel learning framework to incorporate intermediate rewards into GFlowNets. We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments. GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration. Based on extensive experiments on the GridWorld task, we demonstrate the effectiveness and efficiency of GAFlowNet in terms of convergence, performance, and diversity of solutions. We further show that GAFlowNet is scalable to a more complex and large-scale molecule generation domain, where it achieves consistent and significant performance improvement.
This paper builds bridges between two families of probabilistic algorithms: (hierarchical) variational inference (VI), which is typically us… (see more)ed to model distributions over continuous spaces, and generative flow networks (GFlowNets), which have been used for distributions over discrete structures such as graphs. We demonstrate that, in certain cases, VI algorithms are equivalent to special cases of GFlowNets in the sense of equality of expected gradients of their learning objectives. We then point out the differences between the two families and show how these differences emerge experimentally. Notably, GFlowNets, which borrow ideas from reinforcement learning, are more amenable than VI to off-policy training without the cost of high gradient variance induced by importance sampling. We argue that this property of GFlowNets can provide advantages for capturing diversity in multimodal target distributions.
There is growing interest in understanding how real brains may approximate gradients and how gradients can be used to train neuromorphic chi… (see more)ps. However, neither real brains nor neuromorphic chips can perfectly follow the loss gradient, so parameter updates would necessarily use gradient estimators that have some variance and/or bias. Therefore, there is a need to understand better how variance and bias in gradient estimators impact learning dependent on network and task properties. Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. We find that the ideal amount of variance and bias in a gradient estimator are dependent on several properties of the network and task: the size and activity sparsity of the network, the norm of the gradient, and the curvature of the loss landscape. As such, whether considering biologically-plausible learning algorithms or algorithms for training neuromorphic chips, researchers can analyze these properties to determine whether their approximation to gradient descent will be effective for learning given their network and task properties.
Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fa… (see more)il to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X—a set of sixteen human annotations of factors such as pose, background, or lighting the entire ImageNet-1k validation set as well as a random subset of 12k training images. Equipped with ImageNet-X, we investigate 2,200 current recognition models and study the types of mistakes as a function of model’s (1) architecture, e.g. transformer vs. convolutional, (2) learning paradigm, e.g. supervised vs. self-supervised, and (3) training procedures, e.g., data augmentation. Regardless of these choices, we find models have consistent failure modes across ImageNet-X categories. We also find that while data augmentation can improve robustness to certain factors, they induce spill-over effects to other factors. For example, color-jitter augmentation improves robustness to color and brightness, but surprisingly hurts robustness to pose. Together, these insights suggest to advance the robustness of modern vision models, future research should focus on collecting additional data and understanding data augmentation schemes. Along with these insights, we release a toolkit based on ImageNet-X to spur further study into the mistakes image recognition systems make.
Neural Processes (NPs) are popular methods in meta-learning that can estimate predictive uncertainty on target datapoints by conditioning on… (see more) a context dataset. Previous state-of-the-art method Transformer Neural Processes (TNPs) achieve strong performance but require quadratic computation with respect to the number of context datapoints, significantly limiting its scalability. Conversely, existing sub-quadratic NP variants perform significantly worse than that of TNPs. Tackling this issue, we propose Latent Bottlenecked Attentive Neural Processes (LBANPs), a new computationally efficient sub-quadratic NP variant, that has a querying computational complexity independent of the number of context datapoints. The model encodes the context dataset into a constant number of latent vectors on which self-attention is performed. When making predictions, the model retrieves higher-order information from the context dataset via multiple cross-attention mechanisms on the latent vectors. We empirically show that LBANPs achieve results competitive with the state-of-the-art on meta-regression, image completion, and contextual multi-armed bandits. We demonstrate that LBANPs can trade-off the computational cost and performance according to the number of latent vectors. Finally, we show LBANPs can scale beyond existing attention-based NP variants to larger dataset settings.
Learning From FM Communications: Toward Accurate, Efficient, All-Terrain Vehicle Localization
X. Chen
Qiao Xiang
L. Kong
Huisan Xu
Xuemei Liu
Vehicle localization service is a fundamental component of intelligent transportation systems. The widely used satellite navigation systems … (see more)perform poorly in urban areas because the lines of sight to satellites are blocked by complex terrain characteristics, e.g., buildings, elevated streets and interchanges. In this paper, we design RadioLoc, a novel system achieving accurate, efficient, all-terrain vehicle localization with two key design points. First, RadioLoc harvests the frequency modulation (FM) signal, which has higher availability than satellite signal in complex terrains, as the signal source for localization. Second, RadioLoc integrates modern machine learning techniques into the processing of FM signals to efficiently learn the accurate vehicle localization in all-terrain environments. We validate the feasibility of FM-based vehicle localization and corresponding challenges and practical issues via field tests (e.g., signal distortion, signal inconsistency and limited in- vehicle radio bandwidth), and develop a series of advanced techniques in RadioLoc to address them, including adaptive batching, frequency sweeping, a novel multipath delay spread filter, a reconstructive PCA denoiser and a tailored FM feature extractor. We then develop a generic, modular localization module in RadioLoc, and design different learning-based 3D position identification algorithms for this module. We implement a prototype of RadioLoc and perform extensive field experiments to evaluate its efficiency and efficacy. Results show that (1) RadioLoc achieves a real-time localization latency of less than 100 milliseconds; (2) RadioLoc achieves a worst-case localization accuracy of 99.6% even in an underground parking lot, and (3) the horizontal error of RadioLoc is only one sixth of a dedicated GPS device even when the vehicle is moving at a high-speed (i.e., 80 km/h) in a complex highway scenario.
Vehicle localization service is a fundamental component of intelligent transportation systems. The widely used satellite navigation systems … (see more)perform poorly in urban areas because the lines of sight to satellites are blocked by complex terrain characteristics, e.g., buildings, elevated streets and interchanges. In this paper, we design RadioLoc, a novel system achieving accurate, efficient, all-terrain vehicle localization with two key design points. First, RadioLoc harvests the frequency modulation (FM) signal, which has higher availability than satellite signal in complex terrains, as the signal source for localization. Second, RadioLoc integrates modern machine learning techniques into the processing of FM signals to efficiently learn the accurate vehicle localization in all-terrain environments. We validate the feasibility of FM-based vehicle localization and corresponding challenges and practical issues via field tests (e.g., signal distortion, signal inconsistency and limited in- vehicle radio bandwidth), and develop a series of advanced techniques in RadioLoc to address them, including adaptive batching, frequency sweeping, a novel multipath delay spread filter, a reconstructive PCA denoiser and a tailored FM feature extractor. We then develop a generic, modular localization module in RadioLoc, and design different learning-based 3D position identification algorithms for this module. We implement a prototype of RadioLoc and perform extensive field experiments to evaluate its efficiency and efficacy. Results show that (1) RadioLoc achieves a real-time localization latency of less than 100 milliseconds; (2) RadioLoc achieves a worst-case localization accuracy of 99.6% even in an underground parking lot, and (3) the horizontal error of RadioLoc is only one sixth of a dedicated GPS device even when the vehicle is moving at a high-speed (i.e., 80 km/h) in a complex highway scenario.
This paper studies learning on text-attributed graphs (TAGs), where each node is associated with a text description. An ideal solution for s… (see more)uch a problem would be integrating both the text and graph structure information with large language models and graph neural networks (GNNs). However, the problem becomes very challenging when graphs are large due to the high computational complexity brought by training large language models and GNNs together. In this paper, we propose an efficient and effective solution to learning on large text-attributed graphs by fusing graph structure and language learning with a variational Expectation-Maximization (EM) framework, called GLEM. Instead of simultaneously training large language models and GNNs on big graphs, GLEM proposes to alternatively update the two modules in the E-step and M-step. Such a procedure allows training the two modules separately while simultaneously allowing the two modules to interact and mutually enhance each other. Extensive experiments on multiple data sets demonstrate the efficiency and effectiveness of the proposed approach.
We consider the task of generating realistic 3D shapes, which is useful for a variety of applications such as automatic scene generation and… (see more) physical simulation. Compared to other 3D representations like voxels and point clouds, meshes are more desirable in practice, because (1) they enable easy and arbitrary manipulation of shapes for relighting and simulation, and (2) they can fully leverage the power of modern graphics pipelines which are mostly optimized for meshes. Previous scalable methods for generating meshes typically rely on sub-optimal post-processing, and they tend to produce overly-smooth or noisy surfaces without fine-grained geometric details. To overcome these shortcomings, we take advantage of the graph structure of meshes and use a simple yet very effective generative modeling method to generate 3D meshes. Specifically, we represent meshes with deformable tetrahedral grids, and then train a diffusion model on this direct parameterization. We demonstrate the effectiveness of our model on multiple generative tasks.
Modern machine learning research relies on relatively few carefully curated datasets. Even in these datasets, and typically in `untidy' or r… (see more)aw data, practitioners are faced with significant issues of data quality and diversity which can be prohibitively labor intensive to address. Existing methods for dealing with these challenges tend to make strong assumptions about the particular issues at play, and often require a priori knowledge or metadata such as domain labels. Our work is orthogonal to these methods: we instead focus on providing a unified and efficient framework for Metadata Archaeology -- uncovering and inferring metadata of examples in a dataset. We curate different subsets of data that might exist in a dataset (e.g. mislabeled, atypical, or out-of-distribution examples) using simple transformations, and leverage differences in learning dynamics between these probe suites to infer metadata of interest. Our method is on par with far more sophisticated mitigation methods across different tasks: identifying and correcting mislabeled examples, classifying minority-group samples, prioritizing points relevant for training and enabling scalable human auditing of relevant examples.
Molecular representation pretraining is critical in various applications for drug and material discovery due to the limited number of labele… (see more)d molecules, and most existing work focuses on pretraining on 2D molecular graphs. However, the power of pretraining on 3D geometric structures has been less explored. This is owing to the difficulty of finding a sufficient proxy task that can empower the pretraining to effectively extract essential features from the geometric structures. Motivated by the dynamic nature of 3D molecules, where the continuous motion of a molecule in the 3D Euclidean space forms a smooth potential energy surface, we propose GeoSSL, a 3D coordinate denoising pretraining framework to model such an energy landscape. Further by leveraging an SE(3)-invariant score matching method, we propose GeoSSL-DDM in which the coordinate denoising proxy task is effectively boiled down to denoising the pairwise atomic distances in a molecule. Our comprehensive experiments confirm the effectiveness and robustness of our proposed method.