Publications

4+3 Phases of Compute-Optimal Neural Scaling Laws

Elliot Paquette

Courtney Paquette

Lechao Xiao

Jeffrey Pennington

2024-09-25

NeurIPS.cc/2024/Conference (spotlight)

openreview.net

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Harley Wiltzer

Marc Gendron-Bellemare

David Meger

Patrick Shafto

Yash Jhaveri

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Adaptive Exploration for Data-Efficient General Value Function Evaluations

Arushi Jain

Josiah P. Hanna

Doina Precup

General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each… (voir plus) GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This leaves an open question: how can behavior policy be chosen for data-efficient GVF learning? To address this gap, we propose GVFExplorer, which aims at learning a behavior policy that efficiently gathers data for evaluating multiple GVFs in parallel. This behavior policy selects actions in proportion to the total variance in the return across all GVFs, reducing the number of environmental interactions. To enable accurate variance estimation, we use a recently proposed temporal-difference-style variance estimator. We prove that each behavior policy update reduces the mean squared error in the summed predictions over all GVFs. We empirically demonstrate our method's performance in both tabular representations and nonlinear function approximation.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Amortizing intractable inference in diffusion models for vision, language, and control

Moksh J. Jain

Minsu Kim

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Any2Policy: Learning Visuomotor Policy with Any-Modality

Yichen Zhu

Zhicai Ou

Feifei Feng

Jian Tang

Humans can communicate and observe media with different modalities, such as texts, sounds, and images. For robots to be more generalizable e… (voir plus)mbodied agents, they should be capable of following instructions and perceiving the world with adaptation to diverse modalities. Current robotic learning methodologies often focus on single-modal task specification and observation, thereby limiting their ability to process rich multi-modal information. Addressing this limitation, we present an end-to-end general-purpose multi-modal system named Any-to-Policy Embodied Agents. This system empowers robots to handle tasks using various modalities, whether in combinations like text-image, audio-image, text-point cloud, or in isolation. Our innovative approach involves training a versatile modality network that adapts to various inputs and connects with policy networks for effective control. Because of the lack of existing multi-modal robotics datasets for evaluation, we assembled a comprehensive real-world dataset encompassing 30 robotic tasks. Each task in this dataset is richly annotated across multiple modalities, providing a robust foundation for assessment. We conducted extensive validation of our proposed unified modality embodied agent using several simulation benchmarks, including Franka Kitchen, Meta-World, and Maniskill2, as well as in our real-world settings. Our experiments showcase the promising capability of building embodied agents that can adapt to diverse multi-modal in a unified framework.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Balancing Context Length and Mixing Times for Reinforcement Learning at Scale

Matthew D Riemer

Khimya Khetarpal

Janarthanan Rajendran

Mila Janarthanan

Sarath Chandar

É. Montréal

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Cell ontology guided transcriptome foundation model

Manqi Zhou

Boyu Han

Transcriptome foundation models (TFMs) hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by… (voir plus) self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present **s**ingle **c**ell, **Cell**-**o**ntology guided TFM (scCello). We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses. Source code and model weights are available at https://github.com/DeepGraphLearning/scCello.

2024-09-25

NeurIPS.cc/2024/Conference (spotlight)

openreview.net

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

Hao Tang

Keya Hu

Jin Peng Zhou

Si Cheng Zhong

Wei-Long Zheng

Xujie Si

Kevin Ellis

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Conformal Inverse Optimization

Bo Lin

Erick Delage

Timothy Chan

Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that… (voir plus) such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval

Haolun Wu

Ofer Meshi

Masrour Zoghi

Fernando Diaz

Xue (Steve) Liu

Craig Boutilier

MARYAM KARIMZADEHGAN

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers

Jonas Ngnawe

Sabyasachi Sahoo

Yann Batiste Pequignot

Frederic Precioso

Christian Gagné

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Xinwang Chen

Ning Liu

Yichen Zhu

Feifei Feng

Jian Tang

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational req… (voir plus)uirements hinder widespread practical applications. To reduce the computation budget of transformer-based DPMs, this work proposes the Efficient Diffusion Transformer (EDT) framework. This framework includes a lightweight-design diffusion model architecture, and a training-free Attention Modulation Matrix and its alternation arrangement in EDT inspired by human-like sketching. Additionally, we propose a token relation-enhanced masking training strategy tailored explicitly for EDT to augment its token relation learning capability. Our extensive experiments demonstrate the efficacy of EDT. The EDT framework reduces training and inference costs and surpasses existing transformer-based diffusion models in image synthesis performance, thereby achieving a significant overall enhancement. With lower FID, EDT-S, EDT-B, and EDT-XL attained speed-ups of 3.93x, 2.84x, and 1.92x respectively in the training phase, and 2.29x, 2.29x, and 2.22x respectively in inference, compared to the corresponding sizes of MDTv2. Our code is available at https://github.com/xinwangChen/EDT.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Hugo Larochelle nommé directeur scientifique de Mila

Publications

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Hugo Larochelle nommé directeur scientifique de Mila

Mots-clés populaires:

Publications