Adam M. Oberman

2025-05-17

ArXiv (preprint)

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava

Siamak Ravanbakhsh

2025-05-01

arXiv (published)

Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity

David Williams-King

Linh Le

Yoshua Bengio

As LLMs develop increasingly advanced capabilities, there is an increased need to minimize the harm that could be caused to society by certa… (see more)in model outputs; hence, most LLMs have safety guardrails added, for example via fine-tuning. In this paper, we argue the position that current safety fine-tuning is very similar to a traditional cat-and-mouse game (or arms race) between attackers and defenders in cybersecurity. Model jailbreaks and attacks are patched with bandaids to target the specific attack mechanism, but many similar attack vectors might remain. When defenders are not proactively coming up with principled mechanisms, it becomes very easy for attackers to sidestep any new defenses. We show how current defenses are insufficient to prevent new adversarial jailbreak attacks, reward hacking, and loss of control problems. In order to learn from past mistakes in cybersecurity, we draw analogies with historical examples and develop lessons learned that can be applied to LLM safety. These arguments support the need for new and more principled approaches to designing safe models, which are architected for security from the beginning. We describe several such approaches from the AI literature.

2024-10-12

NeurIPS.cc/2024/Workshop/SafeGenAi (poster)

Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity

David Williams-King

Linh Le

Yoshua Bengio

2024-10-12

NeurIPS.cc/2024/Workshop/SafeGenAi (poster)

Harnessing small projectors and multiple views for efficient vision pretraining

Kumar Krishna Agrawal

Arna Ghosh

Shagun Sodhani

Blake Richards

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Multi-Resolution Continuous Normalizing Flows

Vikram Voleti

Chris Finlay

Chris Pal

2024-03-21

Annals of Mathematics and Artificial Intelligence (published)

Addressing Sample Inefficiency in Multi-View Representation Learning

Arna Ghosh

Kumar Krishna Agrawal

Shagun Sodhani

Blake Richards

2024-01-01

NeurIPS (published)

Harnessing small projectors and multiple views for efficient vision pretraining

Kumar Krishna Agrawal

Arna Ghosh

Shagun Sodhani

Blake Richards

Recent progress in self-supervised (SSL) visual representation learning has led to the development of several different proposed frameworks … (see more)that rely on augmentations of images but use different loss functions. However, there are few theoretically grounded principles to guide practice, so practical implementation of each SSL framework requires several heuristics to achieve competitive performance. In this work, we build on recent analytical results to design practical recommendations for competitive and efficient SSL that are grounded in theory. Specifically, recent theory tells us that existing SSL frameworks are minimizing the same idealized loss, which is to learn features that best match the data similarity kernel defined by the augmentations used. We show how this idealized loss can be reformulated to a functionally equivalent loss that is more efficient to compute. We study the implicit bias of using gradient descent to minimize our reformulated loss function and find that using a stronger orthogonalization constraint with a reduced projector dimensionality should yield good representations. Furthermore, the theory tells us that approximating the reformulated loss should be improved by increasing the number of augmentations, and as such using multiple augmentations should lead to improved convergence. We empirically verify our findings on CIFAR, STL and Imagenet datasets, wherein we demonstrate an improved linear readout performance when training a ResNet-backbone using our theoretically grounded recommendations. Remarkably, we also demonstrate that by leveraging these insights, we can reduce the pretraining dataset size by up to 2

2023-12-17

ArXiv (preprint)

Deep PDE Solvers for Subgrid Modelling and Out-of-Distribution Generalization

Patrick Chatain

2023-10-31

NeurIPS.cc/2023/Workshop/DLDE (poster)

EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models

Xinlin Li

Mariana Parazeres

Alireza Ghaffari

Masoud Asgharian

Vahid Nia

2023-06-30

SN Computer Science (published)

A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods

Tiago Salvador

Kilian FATRAS

Ioannis Mitliagkas

Unsupervised Domain Adaptation (UDA) aims at classifying unlabeled target images leveraging source labeled ones. In the case of an extreme l… (see more)abel shift scenario between the source and target domains, where we have extra source classes not present in the target domain, the UDA problem becomes a harder problem called Partial Domain Adaptation (PDA). While different methods have been developed to solve the PDA problem, most successful algorithms use model selection strategies that rely on target labels to find the best hyper-parameters and/or models along training. These strategies violate the main assumption in PDA: only unlabeled target domain samples are available. In addition, there are also experimental inconsistencies between developed methods - different architectures, hyper-parameter tuning, number of runs - yielding unfair comparisons. The main goal of this work is to provide a realistic evaluation of PDA methods under different model selection strategies and a consistent evaluation protocol. We evaluate 6 state-of-the-art PDA algorithms on 2 different real-world datasets using 7 different model selection strategies. Our two main findings are: (i) without target labels for model selection, the accuracy of the methods decreases up to 30 percentage points; (ii) only one method and model selection pair performs well on both datasets. Experiments were performed with our PyTorch framework, BenchmarkPDA, which we open source.

2023-06-11

TMLR (accepted)

Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models

Vikram Voleti

Chris Pal

Generative models based on denoising diffusion techniques have led to an unprecedented increase in the quality and diversity of imagery that… (see more) is now possible to create with neural generative models. However, most contemporary state-of-the-art methods are derived from a standard isotropic Gaussian formulation. In this work we examine the situation where non-isotropic Gaussian distributions are used. We present the key mathematical derivations for creating denoising diffusion models using an underlying non-isotropic Gaussian noise model. We also provide initial experiments with the CIFAR10 dataset to help verify empirically that this more general modelling approach can also yield high-quality samples.

2022-11-29

NeurIPS.cc/2022/Workshop/SBM (poster)