Simplifying Constraint Inference with Inverse Reinforcement Learning
Adriana Hugessen
Harley Wiltzer
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Stephan Günnemann
Current research in adversarial robustness of LLMs focuses on discrete input manipulations in the natural language space, which can be direc… (see more)tly transferred to closed-source models. However, this approach neglects the steady progression of open-source models. As open-source models advance in capability, ensuring their safety also becomes increasingly imperative. Yet, attacks tailored to open-source LLMs that exploit full model access remain largely unexplored. We address this research gap and propose the embedding space attack, which directly attacks the continuous embedding representation of input tokens. We find that embedding space attacks circumvent model alignments and trigger harmful behaviors more efficiently than discrete attacks or model fine-tuning. Furthermore, we present a novel threat model in the context of unlearning and show that embedding space attacks can extract supposedly deleted information from unlearned LLMs across multiple datasets and models. Our findings highlight embedding space attacks as an important threat model in open-source LLMs. Trigger Warning: the appendix contains LLM-generated text with violence and harassment.
Source-Free Domain Adaptation for YOLO Object Detection
Simon Varailhon
Masih Aminbeidokhti
Eric Granger
Source-free domain adaptation (SFDA) is a challenging problem in object detection, where a pre-trained source model is adapted to a new targ… (see more)et domain without using any source domain data for privacy and efficiency reasons. Most state-of-the-art SFDA methods for object detection have been proposed for Faster-RCNN, a detector that is known to have high computational complexity. This paper focuses on domain adaptation techniques for real-world vision systems, particularly for the YOLO family of single-shot detectors known for their fast baselines and practical applications. Our proposed SFDA method - Source-Free YOLO (SF-YOLO) - relies on a teacher-student framework in which the student receives images with a learned, target domain-specific augmentation, allowing the model to be trained with only unlabeled target data and without requiring feature alignment. A challenge with self-training using a mean-teacher architecture in the absence of labels is the rapid decline of accuracy due to noisy or drifting pseudo-labels. To address this issue, a teacher-to-student communication mechanism is introduced to help stabilize the training and reduce the reliance on annotated target data for model selection. Despite its simplicity, our approach is competitive with state-of-the-art detectors on several challenging benchmark datasets, even sometimes outperforming methods that use source data for adaptation.
Source-Free Domain Adaptation for YOLO Object Detection
Simon Varailhon
Masih Aminbeidokhti
Eric Granger
Source-free domain adaptation (SFDA) is a challenging problem in object detection, where a pre-trained source model is adapted to a new targ… (see more)et domain without using any source domain data for privacy and efficiency reasons. Most state-of-the-art SFDA methods for object detection have been proposed for Faster-RCNN, a detector that is known to have high computational complexity. This paper focuses on domain adaptation techniques for real-world vision systems, particularly for the YOLO family of single-shot detectors known for their fast baselines and practical applications. Our proposed SFDA method - Source-Free YOLO (SF-YOLO) - relies on a teacher-student framework in which the student receives images with a learned, target domain-specific augmentation, allowing the model to be trained with only unlabeled target data and without requiring feature alignment. A challenge with self-training using a mean-teacher architecture in the absence of labels is the rapid decline of accuracy due to noisy or drifting pseudo-labels. To address this issue, a teacher-to-student communication mechanism is introduced to help stabilize the training and reduce the reliance on annotated target data for model selection. Despite its simplicity, our approach is competitive with state-of-the-art detectors on several challenging benchmark datasets, even sometimes outperforming methods that use source data for adaptation.
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
Textoshop: Interactions Inspired by Drawing Software to Facilitate Text Editing
Young-Ho Kim
Fanny Chevalier
Textoshop: Interactions Inspired by Drawing Software to Facilitate Text Editing
Young-Ho Kim
Fanny Chevalier
We explore how interactions inspired by drawing software can help edit text. Making an analogy between visual and text editing, we consider … (see more)words as pixels, sentences as regions, and tones as colours. For instance, direct manipulations move, shorten, expand, and reorder text; tools change number, tense, and grammar; colours map to tones explored along three dimensions in a tone picker; and layers help organize and version text. This analogy also leads to new workflows, such as boolean operations on text fragments to construct more elaborated text. A study shows participants were more successful at editing text and preferred using the proposed interface over existing solutions. Broadly, our work highlights the potential of interaction analogies to rethink existing workflows, while capitalizing on familiar features.
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
Ouail Kitouni
Niklas Nolte
Adina Williams
Diane Bouchacourt
Mark Ibrahim
The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms
Elizabeth Collins-Woodfin
Inbar Seroussi
Begoña García Malaxechebarría
Andrew Mackenzie
Elliot Paquette
On the Scalability of Certified Adversarial Robustness with Generated Data
Thomas Altstidl
David Dobre
Arthur Kosmala
Bjoern Eskofier
Leo Schwinn
Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirica… (see more)l methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses but also reveal notable differences in the scaling behavior between certified and empirical methods. In addition, we provide a list of recommendations to scale the robustness of certified training approaches. Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the
On the Scalability of GNNs for Molecular Graphs
Maciej Sypetkowski
Frederik Wenkel
Farimah Poursafaei
Nia Dickson
Karush Suri
Philip Fradkin
Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have obse… (see more)rved a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets, resulting in a 30.25% improvement when scaling to 1 billion parameters and 28.98% improvement when increasing size of dataset to eightfold. We further demonstrate strong finetuning scaling behavior on 38 tasks, outclassing previous large models. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.
Towards a "Universal Translator" for Neural Dynamics at Single-Cell, Single-Spike Resolution
Yizi Zhang
Yanchen Wang
Donato M. Jiménez-Benetó
Zixuan Wang
Mehdi Azabou
Renee Tung
Olivier Winter
International Brain Laboratory
Eva L Dyer
Liam Paninski
Cole Lincoln Hurwitz