Publications

Stress-Testing Capability Elicitation With Password-Locked Models

Ryan Greenblatt

Fabien Roger

Dmitrii Krasheninnikov

David Scott Krueger

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Textoshop: Interactions Inspired by Drawing Software to Facilitate Text Editing

Damien Masson

Young-Ho Kim

Fanny Chevalier

2024-09-25

ArXiv (preprint)

doi.org

arxiv.org

Textoshop: Interactions Inspired by Drawing Software to Facilitate Text Editing

Damien Masson

Young-Ho Kim

Fanny Chevalier

We explore how interactions inspired by drawing software can help edit text. Making an analogy between visual and text editing, we consider … (see more)words as pixels, sentences as regions, and tones as colours. For instance, direct manipulations move, shorten, expand, and reorder text; tools change number, tense, and grammar; colours map to tones explored along three dimensions in a tone picker; and layers help organize and version text. This analogy also leads to new workflows, such as boolean operations on text fragments to construct more elaborated text. A study shows participants were more successful at editing text and preferred using the proposed interface over existing solutions. Broadly, our work highlights the potential of interaction analogies to rethink existing workflows, while capitalizing on familiar features.

2024-09-25

ArXiv (preprint)

doi.org

arxiv.org

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Ouail Kitouni

Niklas Nolte

Adina Williams

Michael Rabbat

Diane Bouchacourt

Mark Ibrahim

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Elizabeth Collins-Woodfin

Inbar Seroussi

Begoña García Malaxechebarría

Andrew Mackenzie

Elliot Paquette

Courtney Paquette

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

On the Scalability of Certified Adversarial Robustness with Generated Data

Thomas Altstidl

David Dobre

Arthur Kosmala

Bjoern Eskofier

Gauthier Gidel

Leo Schwinn

Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirica… (see more)l methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses but also reveal notable differences in the scaling behavior between certified and empirical methods. In addition, we provide a list of recommendations to scale the robustness of certified training approaches. Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

On the Scalability of GNNs for Molecular Graphs

Maciej Sypetkowski

Frederik Wenkel

Farimah Poursafaei

Nia Dickson

Karush Suri

Philip Fradkin

Dominique Beaini

Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have obse… (see more)rved a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets, resulting in a 30.25% improvement when scaling to 1 billion parameters and 28.98% improvement when increasing size of dataset to eightfold. We further demonstrate strong finetuning scaling behavior on 38 tasks, outclassing previous large models. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Towards a "Universal Translator" for Neural Dynamics at Single-Cell, Single-Spike Resolution

Yizi Zhang

Yanchen Wang

Donato M. Jiménez-Benetó

Zixuan Wang

Mehdi Azabou

Blake Richards

Renee Tung

Olivier Winter

International Brain Laboratory

Eva L Dyer

Liam Paninski

Cole Lincoln Hurwitz

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Trajectory Flow Matching with Applications to Clinical Time Series Modelling

Xi Zhang

Yuan Pu

Yuki Kawamura

Andrew Loza

Yoshua Bengio

Dennis Shung

Alexander Tong

Modeling stochastic and irregularly sampled time series is a challenging problem found in a wide range of applications, especially in medici… (see more)ne. Neural stochastic differential equations (Neural SDEs) are an attractive modeling technique for this problem, which parameterize the drift and diffusion terms of an SDE with neural networks. However, current algorithms for training Neural SDEs require backpropagation through the SDE dynamics, greatly limiting their scalability and stability. To address this, we propose **Trajectory Flow Matching** (TFM), which trains a Neural SDE in a *simulation-free* manner, bypassing backpropagation through the dynamics. TFM leverages the flow matching technique from generative modeling to model time series. In this work we first establish necessary conditions for TFM to learn time series data. Next, we present a reparameterization trick which improves training stability. Finally, we adapt TFM to the clinical time series setting, demonstrating improved performance on three clinical time series datasets both in terms of absolute performance and uncertainty prediction.

2024-09-25

NeurIPS.cc/2024/Conference (spotlight)

openreview.net

VisMin: Visual Minimal-Change Understanding

Fine-grained understanding of objects, attributes, and relationships between objects is crucial for visual-language models (VLMs). To evalua… (see more)te VLMs' fine-grained understanding, existing benchmarks primarily focus on evaluating VLMs' capability to distinguish between two very similar captions given an image. In this paper, our focus is on evaluating VLMs' capability to distinguish between two very similar images given a caption. To this end, we introduce a new, challenging benchmark termed Visual Minimal-Change Understanding (VisMin), which requires models to predict the correct image-caption match given two images and two captions. Importantly, the image pair (as well as the caption pair) contains minimal changes, i.e., between the two images (as well as between the two captions), only one aspect changes at a time from among the following possible types of changes: object, attribute, count, and spatial relation. These four types of minimal changes are specifically designed to test the models' understanding of objects, attributes of objects (such as color, material, shape), counts of objects, and spatial relationships between objects. To curate our benchmark, we built an automatic pipeline using large language models and diffusion models, followed by a rigorous 4-step verification process by human annotators. Empirical experiments reveal that current VLMs exhibit notable deficiencies in understanding spatial relationships and counting abilities. Furthermore, leveraging the automated nature of our data creation process, we generate a large-scale training dataset, which we use to finetune CLIP (a foundational VLM) and Idefics2 (a multimodal large language model). Our findings show that both these models benefit significantly from fine-tuning on this data, as evident by marked improvements in fine-grained understanding across a wide range of benchmarks. Additionally, such fine-tuning improves CLIP's general image-text alignment capabilities too. All resources including the benchmark, the training data, and the finetuned model checkpoints will be released.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Wasserstein Distributionally Robust Optimization through the Lens of Structural Causal Models and Individual Fairness

Ahmad Reza Ehyaei

Golnoosh Farnadi

Samira Samadi

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net