Publications

ChatGPT: What Every Pediatric Surgeon Should Know About Its Potential Uses and Pitfalls

Raquel González

Dan Poenaru

Russell Woo

A Francois Trappey

Stewart Carter

David Darcy

Ellen Encisco

Brian Gulack

Doug Miniati

Edzhem Tombash

Eunice Y. Huang

2024-05-01

Journal of Pediatric Surgery (published)

doi.org

CKGConv: General Graph Convolution with Continuous Kernels

Liheng Ma

Soumyasundar Pal

Yitian Zhang

Jiaming Zhou

Yingxue Zhang

Mark Coates

The existing definitions of graph convolution, either from spatial or spectral perspectives, are inflexible and not unified. Defining a gene… (see more)ral convolution operator in the graph domain is challenging due to the lack of canonical coordinates, the presence of irregular structures, and the properties of graph symmetries. In this work, we propose a novel graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding. We name this Continuous Kernel Graph Convolution (CKGConv). Theoretically, we demonstrate that CKGConv is flexible and expressive. CKGConv encompasses many existing graph convolutions, and exhibits the same expressiveness as graph transformers in terms of distinguishing non-isomorphic graphs. Empirically, we show that CKGConv-based Networks outperform existing graph convolutional networks and perform comparably to the best graph transformers across a variety of graph datasets.

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto

Mohammad Sami Nur Islam

Martin Klissarov

Doina Precup

Sherry Yang

Ankit Anand

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

A Distributional Analogue to the Successor Representation

Harley Wiltzer

Jesse Farebrother

Arthur Gretton

Yunhao Tang

Andre Barreto

Will Dabney

Marc Gendron-Bellemare

Mark Rowland

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure … (see more)and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Daniel D. Johnson

Danny Tarlow

David Duvenaud

Chris J. Maddison

Identifying how much a model …

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Faithfulness Measurable Masked Language Models

Andreas Madsen

Siva Reddy

Sarath Chandar Anbil Parthipan

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Graph Positional and Structural Encoder

Semih Cantürk

Renming Liu

Olivier Lapointe-Gagné

Vincent Létourneau

Guy Wolf

Dominique Beaini

Ladislav Rampášek

Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node … (see more)ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Eugene Belilovsky

Guy Wolf

2024-05-01

ICML.cc/2024/Conference (poster)

openreview.net

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Eugene Belilovsky

Guy Wolf

Ensembling multiple models enhances predictive performance by utilizing the varied learned features of the different models but incurs signi… (see more)ficant computational and storage costs. Model fusion, which combines parameters from multiple models into one, aims to mitigate these costs but faces practical challenges due to the complex, non-convex nature of neural network loss landscapes, where learned minima are often separated by high loss barriers. Recent works have explored using permutations to align network features, reducing the loss barrier in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our method of aligning models leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder many models setting where more than 2 models are merged, and we find that CCA Merge works significantly better in this setting than past methods.

2024-05-01

ICML.cc/2024/Conference (poster)

openreview.net

An improved column-generation-based matheuristic for learning classification trees

Krunal Kishor Patel

Guy Desaulniers

Andrea Lodi

2024-05-01

Computers & Operations Research (published)

doi.org

arxiv.org

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Idan Attias

Gintare Karolina Dziugaite

MAHDI HAGHIFAM

Roi Livni

Daniel M. Roy

In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). … (see more)We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the

2024-05-01

ICML.cc/2024/Conference (oral)

doi.org

openreview.net

Interacting Diffusion Processes for Event Sequence Forecasting

Mai Zeng

Florence Regol

Mark Coates

Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time… (see more) intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. The model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPPs.

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications