Publications

Autoformalizing Euclidean Geometry

Logan Murphy

Kaiyu Yang

Jialiang Sun

Zhaoyu Li

Animashree Anandkumar

Autoformalization involves automatically translating informal math into formal theorems and proofs that are machine-verifiable. Euclidean ge… (see more)ometry provides an interesting and controllable domain for studying autoformalization. In this paper, we introduce a neuro-symbolic framework for autoformalizing Euclidean geometry, which combines domain knowledge, SMT solvers, and large language models (LLMs). One challenge in Euclidean geometry is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. To address this issue, we use theorem provers to fill in such diagrammatic information automatically, so that the LLM only needs to autoformalize the explicit textual steps, making it easier for the model. We also provide automatic semantic evaluation for autoformalized theorem statements. We construct LeanEuclid, an autoformalization benchmark consisting of problems from Euclid’s Elements and the UniGeo dataset formalized in the Lean proof assistant. Experiments with GPT-4 and GPT-4V show the capability and limitations of state-of-the-art LLMs on autoformalizing geometry problems. The data and code are available at https://github.com/loganrjmurphy/LeanEuclid.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

CKGConv: General Graph Convolution with Continuous Kernels

Liheng Ma

Soumyasundar Pal

Yitian Zhang

Jiaming Zhou

Yingxue Zhang

Mark J. Coates

The existing definitions of graph convolution, either from spatial or spectral perspectives, are inflexible and not unified. Defining a gene… (see more)ral convolution operator in the graph domain is challenging due to the lack of canonical coordinates, the presence of irregular structures, and the properties of graph symmetries. In this work, we propose a novel and general graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding. We name this Continuous Kernel Graph Convolution (CKGConv). Theoretically, we demonstrate that CKGConv is flexible and expressive. CKGConv encompasses many existing graph convolutions, and exhibits a stronger expressiveness, as powerful as graph transformers in terms of distinguishing non-isomorphic graphs. Empirically, we show that CKGConv-based Networks outperform existing graph convolutional networks and perform comparably to the best graph transformers across a variety of graph datasets. The code and models are publicly available at https://github.com/networkslab/CKGConv.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

A Computational Framework for Solving Wasserstein Lagrangian Flows

Kirill Neklyudov

Rob Brekelmans

Alexander Tong

Lazar Atanackovic

Qiang Liu

Alireza Makhzani

The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry (kinetic energy), and … (see more)the regularization of density paths (potential energy). These combinations yield different variational problems (Lagrangians), encompassing many variations of the optimal transport problem such as the Schrödinger bridge, unbalanced optimal transport, and optimal transport with physical constraints, among others. In general, the optimal density path is unknown, and solving these variational problems can be computationally challenging. We propose a novel deep learning based framework approaching all of these problems from a unified perspective. Leveraging the dual formulation of the Lagrangians, our method does not require simulating or backpropagating through the trajectories of the learned dynamics, and does not need access to optimal couplings. We showcase the versatility of the proposed framework by outperforming previous approaches for the single-cell trajectory inference, where incorporating prior knowledge into the dynamics is crucial for correct predictions.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Consistent Adversarially Robust Linear Classification: Non-Parametric Setting

Elvis Dopgima Dohmatob

For binary classification in …

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

Motahareh Sohrabi

Juan Ramirez

Tianyue H. Zhang

Simon Lacoste-Julien

Jose Gallego-Posada

Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problem… (see more)s are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the lack of reliable, general-purpose update schemes for the Lagrange multipliers. This paper proposes the

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Discovering Environments with XRM

Mohammad Pezeshki

Diane Bouchacourt

Mark Ibrahim

Nicolas Ballas

P Vincent

David Lopez-Paz

Environment annotations are essential for the success of many out-of-distribution (OOD) generalization methods. Unfortunately, these are cos… (see more)tly to obtain and often limited by human annotators’ biases. To achieve robust generalization, it is essential to develop algorithms for automatic environment discovery within datasets. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods introduce hyper-parameters and early-stopping criteria, which require a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk Minimization (XRM) to address this issue. XRM trains twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Algorithms built on top of XRM environments achieve oracle worst-group-accuracy, addressing a long-standing challenge in OOD generalization.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

Don't be so Negative! Score-based Generative Modeling with Oracle-assisted Guidance

Saeid Naderiparizi

Xiaoxuan Liang

Berend Zwartsenberg

Setareh Cohan

Frank N. Wood

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time

Shengyao Lu

Bang Liu

Keith G Mills

Jiao He

Di Niu

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

Estimating Unknown Population Sizes Using the Hypergeometric Distribution

Liam Hodgson

Danilo Bzdok

The multivariate hypergeometric distribution describes sampling without replacement from a discrete population of elements divided into mult… (see more)iple categories. Addressing a gap in the literature, we tackle the challenge of estimating discrete distributions when both the total population size and the sizes of its constituent categories are unknown. Here, we propose a novel solution using the hypergeometric likelihood to solve this estimation challenge, even in the presence of severe under-sampling. We develop our approach to account for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable, such as with collaborative filtering, using the variational autoencoder framework. Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data, both in terms of accuracy of population size estimate and in its ability to learn an informative latent space. We demonstrate our method's versatility through applications in NLP, by inferring and estimating the complexity of latent vocabularies in text excerpts, and in biology, by accurately recovering the true number of gene transcripts from sparse single-cell genomics data.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Daniel D. Johnson

Daniel Tarlow

David Duvenaud

Chris J. Maddison

Identifying how much a model …

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Graph Positional and Structural Encoder

Renming Liu

Semih Cantürk

Olivier Lapointe-Gagné

Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, rendering them essential tools for empoweri… (see more)ng modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for all graph prediction tasks is a challenging and unsolved problem. Here, we present the Graph Positional and Structural Encoder (GPSE), the first-ever graph encoder designed to capture rich PSE representations for augmenting any GNN. GPSE learns an efficient common latent representation for multiple PSEs, and is highly transferable: The encoder trained on a particular graph dataset can be used effectively on datasets drawn from markedly different distributions and modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly outperform those that employ explicitly computed PSEs, and at least match their performance in others. Our results pave the way for the development of foundational pre-trained graph encoders for extracting positional and structural information, and highlight their potential as a more powerful and efficient alternative to explicitly computed PSEs and existing self-supervised pre-training approaches. Our framework and pre-trained models are publicly available at https://github.com/G-Taxonomy-Workgroup/GPSE. For convenience, GPSE has also been integrated into the PyG library to facilitate downstream applications.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Eugene Belilovsky

Guy Wolf

Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the differ… (see more)ent learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural network loss landscapes are high-dimensional and non-convex and the minima found through learning are typically separated by high loss barriers. Numerous recent works have been focused on finding permutations matching one network features to the features of a second one, lowering the loss barrier on the linear path between them in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our alignment method leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder setting where more than 2 models are merged, and we find that CCA Merge works significantly better than past methods. Our code is publicly available at https://github.com/shoroi/align-n-merge

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Mila on Udemy

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Publications

Mila on Udemy

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Popular keywords:

Publications