Publications

Any2Policy: Learning Visuomotor Policy with Any-Modality

Yichen Zhu

Zhicai Ou

Feifei Feng

Humans can communicate and observe media with different modalities, such as texts, sounds, and images. For robots to be more generalizable e… (see more)mbodied agents, they should be capable of following instructions and perceiving the world with adaptation to diverse modalities. Current robotic learning methodologies often focus on single-modal task specification and observation, thereby limiting their ability to process rich multi-modal information. Addressing this limitation, we present an end-to-end general-purpose multi-modal system named Any-to-Policy Embodied Agents. This system empowers robots to handle tasks using various modalities, whether in combinations like text-image, audio-image, text-point cloud, or in isolation. Our innovative approach involves training a versatile modality network that adapts to various inputs and connects with policy networks for effective control. Because of the lack of existing multi-modal robotics datasets for evaluation, we assembled a comprehensive real-world dataset encompassing 30 robotic tasks. Each task in this dataset is richly annotated across multiple modalities, providing a robust foundation for assessment. We conducted extensive validation of our proposed unified modality embodied agent using several simulation benchmarks, including Franka Kitchen, Meta-World, and Maniskill2, as well as in our real-world settings. Our experiments showcase the promising capability of building embodied agents that can adapt to diverse multi-modal in a unified framework.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

openreview.net

Balancing Context Length and Mixing Times for Reinforcement Learning at Scale

Sarath Chandar

Khimya Khetarpal

Janarthanan Rajendran

Matthew Riemer

Due to the recent remarkable advances in artiﬁcial intelligence, researchers have begun to consider challenging learning problems such as … (see more)learning to generalize behavior from large ofﬂine datasets or learning online in non-Markovian environments. Meanwhile, recent advances in both of these areas have increasingly relied on conditioning policies on large context lengths. A natural question is if there is a limit to the performance beneﬁts of increasing the context length if the computation needed is available. In this work, we establish a novel theoretical result that links the context length of a policy to the time needed to reliably evaluate its performance (i.e., its mixing time) in large scale partially observable reinforcement learning environments that exhibit latent sub-task structure. This analysis underscores a key tradeoff: when we extend the context length, our policy can more effectively model non-Markovian dependencies, but this comes at the cost of potentially slower policy evaluation and as a result slower downstream learning. Moreover, our empirical results highlight the relevance of this analysis when leveraging Transformer based neural networks. This perspective will become increasingly pertinent as the ﬁeld scales towards larger and more realistic environments, opening up a number of potential future directions for improving the way we design learning agents.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

Hao Tang

Keya Hu

Jin Peng Zhou

Si Cheng Zhong

Wei-Long Zheng

Xujie Si

Kevin Ellis

2024-09-24

NeurIPS.cc/2024/Conference (poster)

openreview.net

Conformal Inverse Optimization

Bo Lin

Erick Delage

Timothy Chan

Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that… (see more) such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

openreview.net

Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval

Haolun Wu

Ofer Meshi

Masrour Zoghi

Fernando Diaz

Xue Liu

Craig Boutilier

Maryam Karimzadehgan

Accurate modeling of the diverse and dynamic interests of users remains a significant challenge in the design of personalized recommender sy… (see more)stems. Existing user modeling methods, like single-point and multi-point representations, have limitations w.r.t.\ accuracy, diversity, and adaptability. To overcome these deficiencies, we introduce density-based user representations (DURs), a novel method that leverages Gaussian process regression (GPR) for effective multi-interest recommendation and retrieval. Our approach, GPR4DUR, exploits DURs to capture user interest variability without manual tuning, incorporates uncertainty-awareness, and scales well to large numbers of users. Experiments using real-world offline datasets confirm the adaptability and efficiency of GPR4DUR, while online experiments with simulated users demonstrate its ability to address the exploration-exploitation trade-off by effectively utilizing model uncertainty.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers

Frédéric Precioso

Despite extensive research on adversarial training strategies to improve robustness, the decisions of even the most robust deep learning mod… (see more)els can still be quite sensitive to imperceptible perturbations, creating serious risks when deploying them for high-stakes real-world applications. While detecting such cases may be critical, evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios. The input space margin is the exact score to detect non-robust samples and is intractable for deep neural networks. This paper introduces the concept of margin consistency -- a property that links the input space margins and the logit margins in robust models -- for efficient detection of vulnerable samples. First, we establish that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples. Next, through comprehensive empirical analysis of various robustly trained models on CIFAR10 and CIFAR100 datasets, we show that they indicate high margin consistency with a strong correlation between their input space margins and the logit margins. Then, we show that we can effectively and confidently use the logit margin to detect brittle decisions with such models. Finally, we address cases where the model is not sufficiently margin-consistent by learning a pseudo-margin from the feature representation. Our findings highlight the potential of leveraging deep representations to assess adversarial vulnerability in deployment scenarios efficiently.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Xinwang Chen

Ning Liu

Yichen Zhu

Feifei Feng

Jian Tang

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational req… (see more)uirements hinder widespread practical applications. To reduce the computation budget of transformer-based DPMs, this work proposes the Efficient Diffusion Transformer (EDT) framework. This framework includes a lightweight-design diffusion model architecture, and a training-free Attention Modulation Matrix and its alternation arrangement in EDT inspired by human-like sketching. Additionally, we propose a token relation-enhanced masking training strategy tailored explicitly for EDT to augment its token relation learning capability. Our extensive experiments demonstrate the efficacy of EDT. The EDT framework reduces training and inference costs and surpasses existing transformer-based diffusion models in image synthesis performance, thereby achieving a significant overall enhancement. With lower FID, EDT-S, EDT-B, and EDT-XL attained speed-ups of 3.93x, 2.84x, and 1.92x respectively in the training phase, and 2.29x, 2.29x, and 2.22x respectively in inference, compared to the corresponding sizes of MDTv2. Our code is available at https://github.com/xinwangChen/EDT.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

openreview.net

Efficient Adversarial Training in LLMs with Continuous Attacks

Stephan Günnemann

Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial tra… (see more)ining has proven to be one of the most promising methods to reliably improve robustness against such attacks. Yet, in the context of LLMs, current methods for adversarial training are hindered by the high computational costs required to perform discrete adversarial attacks at each training iteration. We address this problem by instead calculating adversarial attacks in the continuous embedding space of the LLM, which is orders of magnitudes more efficient. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses: the first makes the model robust on continuous embedding attacks computed on an adversarial behaviour dataset; the second ensures the usefulness of the final model by fine-tuning on utility data. Moreover, we introduce C-AdvIPO, an adversarial variant of IPO that does not require utility data for adversarially robust alignment. Our empirical evaluation on five models from different families (Gemma, Phi3, Mistral, Zephyr, Llama2) and at different scales (2B, 3.8B, 7B) shows that both algorithms substantially enhance LLM robustness against discrete attacks (GCG, AutoDAN, PAIR), while maintaining utility. Our results demonstrate that robustness to continuous perturbations can extrapolate to discrete threat models. Thereby, we present a path toward scalable adversarial training algorithms for robustly aligning LLMs.

2024-09-24

NeurIPS.cc/2024/Conference (spotlight)

doi.org

openreview.net

Efficient Leverage Score Sampling for Tensor Train Decomposition

Vivek Bharadwaj

Beheshteh T. Rakhshan

Osman Asif Malik

Guillaume Rabusseau

Tensor Train~(TT) decomposition is widely used in the machine learning and quantum physics communities as a popular tool to efficiently comp… (see more)ress high-dimensional tensor data. In this paper, we propose an efficient algorithm to accelerate computing the TT decomposition with the Alternating Least Squares (ALS) algorithm relying on exact leverage scores sampling. For this purpose, we propose a data structure that allows us to efficiently sample from the tensor with time complexity logarithmic in the tensor size. Our contribution specifically leverages the canonical form of the TT decomposition. By maintaining the canonical form through each iteration of ALS, we can efficiently compute (and sample from) the leverage scores, thus achieving significant speed-up in solving each sketched least-square problem. Experiments on synthetic and real data on dense and sparse tensors demonstrate that our method outperforms SVD-based and ALS-based algorithms.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation

Majdi Hassan

Nikhil Shenoy

Jungyoon Lee

Hannes Stärk

Stephan Thaler

Dominique Beaini

Predicting low-energy molecular conformations given a molecular graph is an important but challenging task in computational drug discovery.… (see more) Existing state- of-the-art approaches either resort to large scale transformer-based models that diffuse over conformer fields, or use computationally expensive methods to gen- erate initial structures and diffuse over torsion angles. In this work, we introduce Equivariant Transformer Flow (ET-Flow). We showcase that a well-designed flow matching approach with equivariance and harmonic prior alleviates the need for complex internal geometry calculations and large architectures, contrary to the prevailing methods in the field. Our approach results in a straightforward and scalable method that directly operates on all-atom coordinates with minimal assumptions. With the advantages of equivariance and flow matching, ET-Flow significantly increases the precision and physical validity of the generated con- formers, while being a lighter model and faster at inference. Code is available https://github.com/shenoynikhil/ETFlow.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

openreview.net

A Generative Model of Symmetry Transformations

James Urquhart Allingham

Bruno Mlodozeniec

Shreyas Padhy

Javier Antoran

David M. Krueger

Richard E. Turner

Eric Nalisnick

José Miguel Hernández-Lobato

Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though method… (see more)s incorporating symmetries often require prior knowledge. While recent advancements have been made in learning those symmetries directly from the dataset, most of this work has focused on the discriminative setting. In this paper, we take inspiration from group theoretic ideas to construct a generative model that explicitly aims to capture the data's approximate symmetries. This results in a model that, given a prespecified broad set of possible symmetries, learns to what extent, if at all, those symmetries are actually present. Our model can be seen as a generative process for data augmentation. We provide a simple algorithm for learning our generative model and empirically demonstrate its ability to capture symmetries under affine and color transformations, in an interpretable way. Combining our symmetry model with standard generative models results in higher marginal test-log-likelihoods and improved data efficiency.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Geometry of naturalistic object representations in recurrent neural network models of working memory

Xiaoxuan Lei

Takuya Ito

Pouya Bashivan

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net