Publications

Weighted-Norm Bounds on Model Approximation in MDPs with Unbounded Per-Step Cost

Ashutosh Nayyar

Yi Ouyang

We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov Decision Process (MDP) …

2023-12-12

IEEE Conference on Decision and Control (published)

doi.org

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that b… (see more)oth are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.

2023-12-11

Neural Information Processing Systems (Accept (oral))

doi.org

openreview.net

A*Net: A Scalable Path-based Reasoning Approach for Knowledge Graphs

Ming Zhang

Maxime Gazeau

Jian Tang

Reasoning on large-scale knowledge graphs has been long dominated by embedding methods. While path-based methods possess the inductive capac… (see more)ity that embeddings lack, their scalability is limited by the exponential number of paths. Here we present A*Net, a scalable path-based method for knowledge graph reasoning. Inspired by the A* algorithm for shortest path problems, our A*Net learns a priority function to select important nodes and edges at each iteration, to reduce time and memory footprint for both training and inference. The ratio of selected nodes and edges can be specified to trade off between performance and efficiency. Experiments on both transductive and inductive knowledge graph reasoning benchmarks show that A*Net achieves competitive performance with existing state-of-the-art path-based methods, while merely visiting 10% nodes and 10% edges at each iteration. On a million-scale dataset ogbl-wikikg2, A*Net not only achieves a new state-of-the-art result, but also converges faster than embedding methods. A*Net is the first path-based method for knowledge graph reasoning at such scale.

2023-12-11

Neural Information Processing Systems (Accept (poster))

doi.org

openreview.net

DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing

Yangtian Zhang

Zuobai Zhang

Bozitao Zhong

Sanchit Misra

Jian Tang

Proteins play a critical role in carrying out biological functions, and their 3D structures are essential in determining their functions. Ac… (see more)curately predicting the conformation of protein side-chains given their backbones is important for applications in protein structure prediction, design and protein-protein interactions. Traditional methods are computationally intensive and have limited accuracy, while existing machine learning methods treat the problem as a regression task and overlook the restrictions imposed by the constant covalent bond lengths and angles. In this work, we present DiffPack, a torsional diffusion model that learns the joint distribution of side-chain torsional angles, the only degrees of freedom in side-chain packing, by diffusing and denoising on the torsional space. To avoid issues arising from simultaneous perturbation of all four torsional angles, we propose autoregressively generating the four torsional angles from

2023-12-11

Neural Information Processing Systems (Accept (poster))

doi.org

openreview.net

Equivariant Adaptation of Large Pretrained Models

Arnab Kumar Mondal

Siba Smarak Panigrahi

Sékou-Oumar Kaba

Sai Rajeswar

Siamak Ravanbakhsh

Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to high… (see more)er sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during both training and inference. A recently proposed alternative towards equivariance that removes the architectural constraints is to use a simple canonicalization network that transforms the input to a canonical form before feeding it to an unconstrained prediction network. We show here that this approach can effectively be used to make a large pretrained network equivariant. However, we observe that the produced canonical orientations can be misaligned with those of the training distribution, hindering performance. Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance. This significantly improves the robustness of these models to deterministic transformations of the data, such as rotations. We believe this equivariant adaptation of large pretrained models can help their domain-specific applications with known symmetry priors.

2023-12-11

Neural Information Processing Systems (Accept (poster))

doi.org

openreview.net

Guiding The Last Layer in Federated Learning with Pre-Trained Models

Lucas Caccia

Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data. Rece… (see more)nt works have begun to consider the effects of using pre-trained models as an initialization point for existing FL algorithms; however, these approaches ignore the vast body of efficient transfer learning literature from the centralized learning setting. Here we revisit the problem of FL from a pre-trained model considered in prior work and expand it to a set of computer vision transfer learning problems. We first observe that simply fitting a linear classification head can be efficient and effective in many cases. We then show that in the FL setting, fitting a classifier using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals, while obtaining strong performance. Finally, we demonstrate that using a two-phase approach of obtaining the classifier and then fine-tuning the model can yield rapid convergence and improved generalization in the federated setting. We demonstrate the potential our method has to reduce communication and compute costs while achieving better model performance.

2023-12-11

Neural Information Processing Systems (Accept (poster))

doi.org

openreview.net

A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems

Alexandre AGM Duval

Simon V. Mathis

Chaitanya K. Joshi

Victor Schmidt

Santiago Miret

Fragkiskos D. Malliaros

Taco Cohen

Pietro Lio’

Yoshua Bengio

Michael M. Bronstein

2023-12-11

ArXiv (preprint)

doi.org

arxiv.org

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Shivakanth Sujit

Somjit Nath

Pedro H.M. Braga

Samira Ebrahimi Kahou

Most reinforcement learning algorithms take advantage of an experience replay buffer to repeatedly train on samples the agent has observed i… (see more)n the past. Not all samples carry the same amount of significance and simply assigning equal importance to each of the samples is a naïve strategy. In this paper, we propose a method to prioritize samples based on how much we can learn from a sample. We define the learn-ability of a sample as the steady decrease of the training loss associated with this sample over time. We develop an algorithm to prioritize samples with high learn-ability, while assigning lower priority to those that are hard-to-learn, typically caused by noise or stochasticity. We empirically show that our method is more robust than random sampling and also better than just prioritizing with respect to the training loss, i.e. the temporal difference loss, which is used in prioritized experience replay.

2023-12-11

Neural Information Processing Systems (Accept (poster))

doi.org

openreview.net

A Unified, Scalable Framework for Neural Population Decoding

Mehdi Azabou

Vinam Arora

Venkataramana Ganesh

Ximeng Mao

Santosh Nachimuthu

Michael J. Mendelson

Blake Richards

Matthew G. Perich

Guillaume Lajoie

Eva L. Dyer

Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size… (see more) and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.

2023-12-11

Neural Information Processing Systems (Accept (poster))

doi.org

openreview.net

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Avi Singh

John D Co-Reyes

Rishabh Agarwal

Ankesh Anand

Piyush Patil

Xavier Garcia

Peter J. Liu

James Harrison

Jaehoon Lee

Kelvin Xu

Aaron T Parisi

Abhishek Kumar

A. Alemi

Alex Rizkowsky

Azade Nova

Ben Adlam

Bernd Bohnet

Hanie Sedghi

Gamaleldin Fathy Elsayed

Igor Mordatch … (see 21 more)

Isabelle Simpson

Izzeddin Gur

Jasper Snoek

Jeffrey Pennington

Jiri Hron

Kathleen Kenealy

Kevin Swersky

Kshiteej Mahajan

Laura Culp

Lechao Xiao

Maxwell Bileschi

Noah Constant

Roman Novak

Rosanne Liu

Tris Brian Warkentin

Yundi Qian

Ethan Dyer

Behnam Neyshabur

Jascha Sohl-Dickstein

Yamini Bansal

Noah Fiedel

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often lim… (see more)ited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST

2023-12-10

ArXiv (preprint)

doi.org

arxiv.org

Efficient Graphics Representation with Differentiable Indirection

Sayantan Datta

Carl Marshall

Zhao Dong

Zhengqin Li

D. Nowrouzezahrai

We introduce differentiable indirection – a novel learned primitive that employs differentiable multi-scale lookup tables as an effective … (see more)substitute for traditional compute and data operations across the graphics pipeline. We demonstrate its flexibility on a number of graphics tasks, i.e., geometric and image representation, texture mapping, shading, and radiance field representation. In all cases, differentiable indirection seamlessly integrates into existing architectures, trains rapidly, and yields both versatile and efficient results.

2023-12-10

SIGGRAPH Asia 2023 Conference Papers (published)

doi.org

arxiv.org

Explorable Mesh Deformation Subspaces from Unstructured 3D Generative Models

Arman Maesumi

Paul Guerrero

Noam Aigerman

Vladimir Kim

Matthew Fisher

Siddhartha Chaudhuri

Daniel Ritchie

2023-12-10

SIGGRAPH Asia 2023 Conference Papers (published)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications