Publications

CKGConv: General Graph Convolution with Continuous Kernels

Liheng Ma

Soumyasundar Pal

Yitian Zhang

Jiaming Zhou

Yingxue Zhang

The existing definitions of graph convolution, either from spatial or spectral perspectives, are inflexible and not unified. Defining a gene… (voir plus)ral convolution operator in the graph domain is challenging due to the lack of canonical coordinates, the presence of irregular structures, and the properties of graph symmetries. In this work, we propose a novel graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding. We name this Continuous Kernel Graph Convolution (CKGConv). Theoretically, we demonstrate that CKGConv is flexible and expressive. CKGConv encompasses many existing graph convolutions, and exhibits the same expressiveness as graph transformers in terms of distinguishing non-isomorphic graphs. Empirically, we show that CKGConv-based Networks outperform existing graph convolutional networks and perform comparably to the best graph transformers across a variety of graph datasets.

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto

Mohammad Sami Nur Islam

Martin Klissarov

Doina Precup

Sherry Yang

Ankit Anand

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

A Distributional Analogue to the Successor Representation

Harley Wiltzer

Jesse Farebrother

Arthur Gretton

Yunhao Tang

Andre Barreto

Will Dabney

Marc Gendron-Bellemare

Mark Rowland

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure … (voir plus)and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Daniel D. Johnson

Danny Tarlow

David Duvenaud

Chris J. Maddison

Identifying how much a model …

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Faithfulness Measurable Masked Language Models

Andreas Madsen

Siva Reddy

Sarath Chandar Anbil Parthipan

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Graph Positional and Structural Encoder

Semih Cantürk

Renming Liu

Olivier Lapointe-Gagné

Vincent Létourneau

Guy Wolf

Dominique Beaini

Ladislav Rampášek

Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node … (voir plus)ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Eugene Belilovsky

Guy Wolf

Ensembling multiple models enhances predictive performance by utilizing the varied learned features of the different models but incurs signi… (voir plus)ficant computational and storage costs. Model fusion, which combines parameters from multiple models into one, aims to mitigate these costs but faces practical challenges due to the complex, non-convex nature of neural network loss landscapes, where learned minima are often separated by high loss barriers. Recent works have explored using permutations to align network features, reducing the loss barrier in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our method of aligning models leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder many models setting where more than 2 models are merged, and we find that CCA Merge works significantly better in this setting than past methods.

2024-05-01

ICML.cc/2024/Conference (poster)

openreview.net

An improved column-generation-based matheuristic for learning classification trees

Krunal Kishor Patel

Guy Desaulniers

Andrea Lodi

2024-05-01

Computers & Operations Research (publié)

doi.org

arxiv.org

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Idan Attias

Gintare Karolina Dziugaite

MAHDI HAGHIFAM

Roi Livni

Daniel M. Roy

In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). … (voir plus)We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the

2024-05-01

ICML.cc/2024/Conference (présentation orale)

doi.org

openreview.net

Interacting Diffusion Processes for Event Sequence Forecasting

Mai Zeng

Florence Regol

Mark Coates

Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time… (voir plus) intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. The model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPPs.

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Tara Akhound-Sadegh

Jarrid Rector-Brooks

Joey Bose

Sarthak Mittal

Pablo Lemos

Cheng-Hao Liu

Marcin Sendera

Siamak Ravanbakhsh

Gauthier Gidel

Yoshua Bengio

Nikolay Malkin

Alexander Tong

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (voir plus)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient---and no data samples---to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is *simulation-free*, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

Jinsoo Yoo

Yunpeng Liu

Frank Wood

Geoff Pleiss

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org