Publications

CKGConv: General Graph Convolution with Continuous Kernels

Liheng Ma

Soumyasundar Pal

Yitian Zhang

Jiaming Zhou

Yingxue Zhang

The existing definitions of graph convolution, either from spatial or spectral perspectives, are inflexible and not unified. Defining a gene… (voir plus)ral convolution operator in the graph domain is challenging due to the lack of canonical coordinates, the presence of irregular structures, and the properties of graph symmetries. In this work, we propose a novel and general graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding. We name this Continuous Kernel Graph Convolution (CKGConv). Theoretically, we demonstrate that CKGConv is flexible and expressive. CKGConv encompasses many existing graph convolutions, and exhibits a stronger expressiveness, as powerful as graph transformers in terms of distinguishing non-isomorphic graphs. Empirically, we show that CKGConv-based Networks outperform existing graph convolutional networks and perform comparably to the best graph transformers across a variety of graph datasets. The code and models are publicly available at https://github.com/networkslab/CKGConv.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

openreview.net

On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

Motahareh Sohrabi

Juan Ramirez

Tianyue H. Zhang

Simon Lacoste-Julien

Jose Gallego-Posada

Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problem… (voir plus)s are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the lack of reliable, general-purpose update schemes for the Lagrange multipliers. This paper proposes the νPI algorithm and contributes an optimization perspective on Lagrange multiplier updates based on PI controllers, extending the work of Stooke, Achiam and Abbeel (2020). We provide theoretical and empirical insights explaining the inability of momentum methods to address the shortcomings of gradient descent-ascent, and contrast this with the empirical success of our proposed νPI controller. Moreover, we prove that νPI generalizes popular momentum methods for single-objective minimization. Our experiments demonstrate that νPI reliably stabilizes the multiplier dynamics and its hyperparameters enjoy robust and predictable behavior.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

openreview.net

EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time

Shengyao Lu

Bang Liu

Keith G Mills

Jiao He

Di Niu

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

openreview.net

Estimating Unknown Population Sizes Using the Hypergeometric Distribution

Liam Hodgson

Danilo Bzdok

The multivariate hypergeometric distribution describes sampling without replacement from a discrete population of elements divided into mult… (voir plus)iple categories. Addressing a gap in the literature, we tackle the challenge of estimating discrete distributions when both the total population size and the category sizes are unknown. Here, we propose a novel solution using the hypergeometric likelihood to solve this estimation problem, even in the presence of severe under-sampling. Our approach accounts for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable, as seen in collaborative filtering, using the variational autoencoder framework. Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data, both in terms of accuracy of population size estimate and learning an informative latent space. We showcase our method’s versatility through applications in NLP, by inferring and estimating the complexity of latent vocabularies in reading passage excerpts, and in biology, by accurately recovering the true number of gene transcripts from sparse single-cell genomics data.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Daniel D. Johnson

Danny Tarlow

David Duvenaud

Chris J. Maddison

Identifying how much a model …

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

openreview.net

Graph Positional and Structural Encoder

Renming Liu

Semih Cantürk

Olivier Lapointe-Gagné

Vincent Létourneau

Guy Wolf

Dominique Beaini

Ladislav Rampášek

Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node … (voir plus)ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

openreview.net

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Eugene Belilovsky

Guy Wolf

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

openreview.net

Implicit meta-learning may lead language models to trust more reliable sources

Dmitrii Krasheninnikov

Egor Krasheninnikov

Bruno Mlodozeniec

Tegan Maharaj

David Scott Krueger

We demonstrate that large language models (LLMs) may learn indicators of document usefulness and modulate their updates accordingly. We intr… (voir plus)oduce random strings ("tags") as indicators of usefulness in a synthetic fine-tuning dataset. Fine-tuning on this dataset leads to **implicit meta-learning (IML)**: in further fine-tuning, the model updates to make more use of text that is tagged as useful. We perform a thorough empirical investigation of this phenomenon, finding (among other things) that (i) it occurs in both pretrained LLMs and those trained from scratch, as well as on a vision task, and (ii) larger models and smaller batch sizes tend to give more IML. We also use probing to examine how IML changes the way models store knowledge in their parameters. Finally, we reflect on what our results might imply about the capabilities, risks, and controllability of future AI systems.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

openreview.net

Improving Gradient-Guided Nested Sampling for Posterior Inference

Pablo Lemos

Nikolay Malkin

Will Handley

Yoshua Bengio

Yashar Hezaveh

Laurence Perreault-Levasseur

We present a performant, general-purpose gradient-guided nested sampling (GGNS) algorithm, combining the state of the art in differentiable … (voir plus)programming, Hamiltonian slice sampling, clustering, mode separation, dynamic nested sampling, and parallelization. This unique combination allows GGNS to scale well with dimensionality and perform competitively on a variety of synthetic and real-world problems. We also show the potential of combining nested sampling with generative flow networks to obtain large amounts of high-quality samples from the posterior distribution. This combination leads to faster mode discovery and more accurate estimates of the partition function.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

openreview.net

Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Memorization, and Tracing

Idan Attias

Gintare Karolina Dziugaite

MAHDI HAGHIFAM

Roi Livni

Daniel M. Roy

In this work, we investigate the interplay between memorization and learning in the context of stochastic convex optimization (SCO)… (voir plus). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Gaurav Sahu

Abhay Puri

Juan A. Rodriguez

Alexandre Drouin

Perouz Taslakian

Valentina Zantedeschi

Alexandre Lacoste

David Vazquez

Nicolas Chapados

Chris Pal

Sai Rajeswar

Issam Hadj Laradji

2024-07-08

ArXiv (prépublication)

doi.org

arxiv.org

Interacting Diffusion Processes for Event Sequence Forecasting

Mai Zeng

Florence Regol

Mark Coates

Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time… (voir plus) intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. The model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPPs.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

openreview.net

Le traitement du langage naturel à l'ère de l'IA générative

Boussole des politiques en IA

Vie étudiante et ressources

Publications

Le traitement du langage naturel à l'ère de l'IA générative

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications