Publications

Managing AI Risks in an Era of Rapid Progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Yuval Noah Harari
Trevor Darrell
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (see 5 more)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Sören Mindermann
Assortment Optimization with Visibility Constraints
Théo Barré
Omar El Housni
Towards Modular LLMs by Building and Reusing a Library of LoRAs
Oleksiy Ostapenko
Zhan Su
Edoardo Ponti
Matheus Pereira
Lucas Caccia
The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trai… (see more)ned adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approaches to build this library and introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters, indirectly optimizing for transfer across the multi-task dataset. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters for new inputs without the need for retraining. We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. We make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training.
GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling
Yimin Fan
Yu Li
What Mechanisms Does Knowledge Distillation Distill?
Cindy Wu
Ekdeep Singh Lubana
Bruno Mlodozeniec
Robert Kirk
Knowledge distillation is a commonly-used compression method in ML due to the popularity of increasingly large-scale models, but it is uncle… (see more)ar if all the information a teacher model contains is distilled into the smaller student model. We aim to formalize the concept of ‘knowledge’ to investigate how knowledge is transferred during distillation, focusing on shared invariant outputs to counterfactual changes of dataset latent variables (we call these latents mechanisms). We define a student model to be a good stand-in model for a teacher if it shares the teacher’s learned mechanisms, and find that Jacobian matching and contrastive representation learning are viable methods by which to train such models. While these methods do not result in perfect transfer of mechanisms, we show they often improve student fidelity or mitigate simplicity bias (as measured by the teacher-to-student KL divergence and accuracy on various out-of-distribution test datasets), especially on datasets with spurious statistical correlations.
TEMPLATES: Characterization of a Merger in the Dusty Lensing SPT0418-47 System
Jared Cathey
Anthony H. Gonzalez
Sidney Lower
Kedar A. Phadke
Justin Spilker
Manuel Aravena
Matthew Bayliss
Jack E. Birkin
Simon Birrer
Scott Chapman
Håkon Dahle
Christopher C. Hayward
Ryley Hill
Taylor A. Hutchison
Keunho J. Kim
Guillaume Mahler
Daniel P. Marrone
Desika Narayanan
Alexander Navarre … (see 7 more)
Cassie Reuter
Jane R Rigby
Keren Sharon
Manuel Solimano
Nikolaus Sulzenauer
Joaquin Vieira
David Vizgan
An AI-Resilient Text Rendering Technique for Reading and Skimming Documents
Ziwei Gu
Kenneth Li
Jonathan K. Kummerfeld
Elena L. Glassman
ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Chelse Swoopes
Priyan Vaithilingam
Martin Wattenberg
Elena L. Glassman
Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that… (see more) go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.
295. Rare Variant Genetic Architecture of the Human Cortical MRI Phenotypes in General Population
Kuldeep Kumar
Sayeh Kazem
Zhijie Liao
Jakub Kopal
Guillaume Huguet
Thomas Renne
Martineau Jean-Louis
Zhe Xie
Zohra Saci
Laura Almasy
David C. Glahn
Tomas Paus
Carrie Bearden
Paul Thompson
Richard A.I. Bethlehem
Varun Warrier
Sébastien Jacquemont
All-in-one simulation-based inference
Manuel Gloeckler
Michael Deistler
Christian Dietrich Weilbach
Jakob H. Macke
ChatGPT: What Every Pediatric Surgeon Should Know About Its Potential Uses and Pitfalls
Raquel González
Russell Woo
A Francois Trappey
Stewart Carter
David Darcy
Ellen Encisco
Brian Gulack
Doug Miniati
Edzhem Tombash
Eunice Y. Huang
CKGConv: General Graph Convolution with Continuous Kernels
Liheng Ma
Soumyasundar Pal
Yitian Zhang
Jiaming Zhou
Yingxue Zhang
The existing definitions of graph convolution, either from spatial or spectral perspectives, are inflexible and not unified. Defining a gene… (see more)ral convolution operator in the graph domain is challenging due to the lack of canonical coordinates, the presence of irregular structures, and the properties of graph symmetries. In this work, we propose a novel graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding. We name this Continuous Kernel Graph Convolution (CKGConv). Theoretically, we demonstrate that CKGConv is flexible and expressive. CKGConv encompasses many existing graph convolutions, and exhibits the same expressiveness as graph transformers in terms of distinguishing non-isomorphic graphs. Empirically, we show that CKGConv-based Networks outperform existing graph convolutional networks and perform comparably to the best graph transformers across a variety of graph datasets.