Anirudh Goyal

Machine unlearning, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible … (see more)using existing techniques. We propose a low-compute unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the dataset. We evaluate the proposed technique on the problem of class unlearning using four datasets: CIFAR-10, CIFAR-100, LACUNA-100 and ImageNet-1k. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all four datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.

2025-09-15

TMLR (accepted)

Jailbreak Distillation: Renewable Safety Benchmarking

Jingyu Zhang

Ahmed Elgohary

Xiawei Wang

A S M Iftekhar

Ahmed Magooda

Benjamin Van Durme

Daniel Khashabi

Kyle Jackson

JBDistill Benchmark JBDistill Benchmark

Marah Ihab Abdin

Jyoti Aneja

Harkirat Singh Behl

Sébastien Bubeck

Ronen Eldan

S. Gunasekar

Michael Harrison

Russell J. Hewett

Mojan Javaheripi

Piero Kauffmann

James R. Lee … (see 484 more)

Yin Tat Lee

Yuanzhi Li

Weishung Liu

C. C. T. Mendes

Anh Nguyen

Eric Price

Gustavo de Rosa

Olli Saarikivi

Adil Salim

Tim Beyer

Sophie Xhonneux

Simon Geisler

Gauthier Gidel

Leo Schwinn

Stephan Günnemann. 2025

Blake Bullwinkel

Amanda Minnich

Shiven Chawla

Gary Lopez

Martin Pouliot

Whitney Maxwell

Patrick Chao

Edoardo Debenedetti

Alexander Robey

Maksym Andriushchenko

Francesco Croce

Vikash Sehwag

Edgar Dobriban

Nicolas Flammarion

George J. Pappas

Florian Tramèr

Hamed Hassani

Eric Wong

Jailbreakbench

Zora Che

Stephen Casper

Robert Kirk

Anirudh Satheesh

Stewart Slocum

Lev E McKinney

Rohit Gandikota

Aidan Ewart

Domenic Rosati

Zichu Wu

Zikui Cai

Daya Guo

Dejian Yang

Haowei Zhang

Jun-Mei Song

Ruoyu Zhang

Runxin Xu

Qihao Zhu

Shirong Ma

Peiyi Wang

Xiaoling Bi

Xiaokang Zhang

Xingkai Yu

Yu Wu

Z. F. Wu

Zhibin Gou

Zhihong Shao

Zhuoshu Li

Ziyi Gao

A. Liu

Bing Xue

Bingxuan Wang

Bo WU

Bei Feng

Chenggang Lu

Chenggang Zhao

Chengqi Deng

Chenyu Zhang

C. Ruan

Damai Dai

Deli Chen

Dong-Li Ji

Erhang Li

Fangyun Lin

Fucong Dai

Fuli Luo

Guangbo Hao

Guanting Chen

Guowei Li

Han Bao

Hanwei Xu

Haocheng Wang

Honghui Ding

Huajian Xin

Huazuo Gao

Hui Qu

Hui Li

Jianzhong Guo

Jiashi Li

Jiawei Wang

Jingchang Chen

Jingyang Yuan

Junjie Qiu

Junlong Li

J. Cai

J. Ni

Jian Liang

Jin Chen

Kai Dong

Kai Hu

Kaige Gao

Kang Guan

Kexin Huang

Kuai Yu

Lean Wang

Lecong Zhang

Liang Zhao

Litong Wang

Liyue Zhang

Lei Xu

Leyi Xia

Mingchuan Zhang

Minghua Zhang

Min Tang

Meng Li

Miaojun Wang

Mingming Li

Ning Tian

Panpan Huang

Meng Wang

Qiancheng Wang

Qinyu Chen

Qiushi Du

Ruiqi Ge

Ruisong Zhang

Ruizhe Pan

Runji Wang

R. J. Chen

Rong Jin

Ruyi Chen

Shanghao Lu

Shangyan Zhou

Shanhuang Chen

Shengfeng Ye

Shiyu Wang

Shuiping Yu

Shunfeng Zhou

Shuting Pan

S. S. Li

Shuang Zhou

Shao-Ping Wu

Tao Yun

Tian Pei

Tianyu Sun

T. Wang

Wangding Zeng

Wanjia Zhao

Wen Liu

Wenfeng Liang

Wenjun Gao

Wen-Xuan Yu

Wentao Zhang

Wei Xiao

Wei An

Xiaodong Liu

Xiaohan Wang

Xiaokang Chen

Xiaotao Nie

Xin Cheng

Jian Li

Xinfeng Xie

Xingchao Liu

Xinyu Yang

Xinyuan Li

Xuecheng Su

Xuheng Lin

Xiangyu Jin

Xi-Cheng Shen

Xiaosha Chen

Xiaowen Sun

Xiaoxi-ang Wang

Xinnan Song

Xinyi Zhou

Xianzu Wang

Xinxia Shan

Y. K. Li

Y. Q. Wang

Y. X. Wei

Yang Zhang

Yan-Hong Xu

Yao Zhao

Yaofeng Sun

Yaohui Wang

Yi Yu

Yichao Zhang

Yifan Shi

Yi Xiong

Ying He

Yishi Piao

Yisong Wang

Yi Chern Tan

Yiyang Ma

Yiyuan Liu

Yongqiang Guo

Yuan Ou

Yuduan Wang

Yue Gong

Yuheng Zou

Yuzi He

Yunfan Xiong

Yuxiang Luo

Yuxiang You

Yu-mei You

Yuxuan Liu

Yuyang Zhou

Y. X. Zhu

Yanping Huang

Yaohui Li

Yang Li

Yi Zheng

Yunxiang Ma

Ying Tang

Yukun Zha

Yuting Yan

Z. Z. Ren

Zehui Ren

Zhangli Sha

Zhe Fu

Zhean Xu

Zhenda Xie

Zhengyan Zhang

Zhewen Hao

Zhicheng Ma

Zhigang Yan

Zhiyu Wu

Zihui Gu

Zijia Zhu

Zijun Liu

Zi-An Li

Ziwei Xie

Ziyang Song

Deep Ganguli

Liane Lovitt

Jackson Kernion

Amanda Askell

Yuntao Bai

Saurav Kadavath

Benjamin Mann

Ethan Perez

Nicholas Schiefer

Kamal Ndousse

Andy Jones

Sam Bowman

Anna Chen

Tom Con-erly

Nova Dassarma

Dawn Drain

Nelson Elhage Sheer

Stanislav Fort

Zac Hatfield-Dodds

T. Henighan

Danny Hernandez

Tristan Hume

Josh Jacobson

Scott Johnston

Shauna Kravec

Catherine Olsson

Sam Ringer

Eli Tran-Johnson

Dario Amodei

Tom Brown

Nicholas Joseph

Sam McCandlish

Chris Olah

Jared Kaplan

Jack Clark. 2022. Red

Aaron Grattafiori

Abhimanyu Dubey

Abhinav Jauhri

Abhinav Pandey

Abhishek Kadian

Ahmad Al-Dahle

Aiesha Letman

Akhil Mathur

Alan Schel-ten

Alex Vaughan

Amy Yang

Angela Fan

A. Hartshorn

Aobo Yang

Archi Mitra

Archie Sravankumar

Artem Korenev

Arthur Hinsvark

Arun Rao

Aston Zhang

Aurelien Ro-driguez

Austen Gregerson

Ava Spataru

Baptiste Rozière

Bethany Biron

Binh Tang

Bobbie Chern

Charlotte Caucheteux

Chaya Nayak

Chloe Bi

Chris Marra

Chris McConnell

Christian Keller

Christophe Touret

Chunyang Wu

Corinne Wong

Cris-tian Cantón Ferrer

Cyrus Nikolaidis

Damien Al-lonsius

Daniel Song

Danielle Pintz

Danny Livshits

Danny Wyatt

David Esiobu

Dhruv Choudhary

Dhruv Mahajan 0001

Diego Garcia-Olano

Diego Perino

Dieuwke Hupkes

Egor Lakomkin

Ehab A. AlBadawy

Elina Lobanova

Emily Dinan

Eric Michael Smith

Filip Radenovic

Francisco Guzmán

Frank Zhang

Gabriele Synnaeve

Gabrielle Lee

Georgia Lewis

G. Thattai

Graeme Nail

Gregoire Mi-alon

Guan Pang

Guillem Cucurell

Hailey Nguyen

Han-nah Korevaar

Hu Xu

Hugo Touvron

Imanol Iliyan Zarov

Arrieta Ibarra

Is-abel Kloumann

Ishan Misra

Ivan Evtimov

Jack Zhang

Jade Copet

Jaewon Lee

Jan Geffert

Jana Vranes

Jason Park

Jay Mahadeokar

Jeet Shah

Jelmer van der Linde

Jennifer Billock

Jenny Hong

Jenya Lee

Jeremy Fu

J. Fu

Jianfeng Chi

Jianyu Huang

Jiawen Liu

Jie Wang

Jiecao Yu

Joanna Bitton

Joe Spisak

Jongsoo Park

Joseph Rocca

J. Johnstun

Joshua Saxe

Junteng Jia

Kalyan Vasuden Alwala

Karthik Prasad

Kartikeya Upasani

Kate Plawiak

Keqian Li

Kenneth Heafield

Kevin R. Stone

Khalid El-Arini

Krithika Iyer

Kshitiz Malik

Kuen-ley Chiu

Kunal Bhalla

Kushal Lakhotia

Lauren Rantala-Yeary

Laurens van der Maaten

Lawrence Chen

Liang Tan

Liz Jenkins

Louis Martin

Lovish Madaan

Lubo Malo

Lukas Blecher

Lukas Landzaat

Luke de Oliveira

Madeline Muzzi

Mahesh Pasupuleti

Mannat Singh

Manohar Paluri

Marcin Kardas

Maria Tsimpoukelli

Mathew Oldham

Mathieu Rita

Maya Pavlova

Melanie Kam-badur

Mike Lewis

Mitesh Min Si

Kumar Singh

Mona Hassan

Naman Goyal

Narjes Torabi

Niko-lay Bashlykov

Nikolay Bogoychev

Niladri S. Chatterji

Ning Zhang

Olivier Duchenne

Onur Çelebi

Patrick Alrassy

Petar Pengwei Li

Peter Weng

Prajjwal Bhargava

Pratik Dubal

Punit Praveen Krishnan

Singh Koura

Puxin Xu

Qing He

Qingxiao Dong

Ragavan Srinivasan

Raj Ganapathy

Ramon Calderer

Ricardo Silveira Cabral

Robert Stojnic

Roberta Raileanu

Rohan Maheswari

Rohit Girdhar

Rohit Patel

Ro-main Sauvestre

Ron-nie Polidoro

Roshan Sumbaly

Ross Taylor

Ruan Silva

Rui Hou

Rui Wang

S. Hosseini

Sa-hana Chennabasappa

Sanjay Singh

Sean Bell

Seo-hyun Sonia Kim

Sergey Edunov

Shaoliang Nie

Sharan Narang

Sharath Chandra Raparthy

Sheng Shen

Shengye Wan

Shruti Bhosale

Shun Zhang

Simon Van-denhende

Soumya Batra

Spencer Whitman

Sten Sootla

Stephane Collot

Suchin Gururangan

S. Borodinsky

Tamar Herman

Tara Fowler

Tarek Sheasha

Thomas Georgiou

Thomas Scialom

Tobias Speckbacher

Todor Mihaylov

Tong Xiao

Ujjwal Karn

Vedanuj Goswami

Vibhor Gupta

Vignesh Ramanathan

Viktor Kerkez

Vincent Gonguet

Vir-ginie Do

Vish Vogeti

Vitor Albiero

Vladan Petro-vic

Weiwei Chu

Wenhan Xiong

Wenyin Fu

2025-05-28

ArXiv (preprint)

On the Transfer of Object-Centric Representation Learning.

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

2025-01-22

ICLR.cc/2025/Conference (poster)

On the Transfer of Object-Centric Representation Learning

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (see more) into individual vectors. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing features from pre-trained foundation models like DINO. However, so far, these object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the underlying foundation models, which have been shown to be applicable to a wide range of data and tasks. Thus, in this work, we answer the question of whether current real-world capable object-centric methods exhibit similar levels of transferability by introducing a benchmark comprising seven different synthetic and real-world datasets. We analyze the factors influencing performance under transfer and find that training on diverse real-world images improves generalization to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

2025-01-22

ICLR.cc/2025/Conference (poster)

Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases

Cristian Meo

Akihiro Nakano

Mircea Tudor Lică

Aniket Rajiv Didolkar

Masahiro Suzuki

Mengmi Zhang

Justin Dauwels

Yutaka Matsuo

2024-10-10

NeurIPS.cc/2024/Workshop/Compositional_Learning (poster)

AI-Assisted Generation of Difficult Math Questions

Vedant Shah

Dingli Yu

Kaifeng Lyu

Simon Park

Nan Rosemary Ke

Jiatong Yu

Yinghui He

Michael Curtis Mozer

James Lloyd McClelland

Sanjeev Arora

Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet dem… (see more)and for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core"skills"from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an"out of distribution"task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepted)

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Rajiv Didolkar

Nan Rosemary Ke

Siyuan Guo

Michal Valko

Timothy P Lillicrap

Danilo Jimenez Rezende

Michael Curtis Mozer

Sanjeev Arora

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Zero-Shot Object-Centric Representation Learning

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (see more). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

2024-08-17

ArXiv (preprint)

Zero-Shot Object-Centric Representation Learning

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (see more). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

2024-08-17

ArXiv (preprint)

Zero-Shot Object-Centric Representation Learning

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (see more). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

2024-08-17

ArXiv (preprint)

Zero-Shot Object-Centric Representation Learning

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (see more). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

2024-08-17

ArXiv (preprint)

Zero-Shot Object-Centric Representation Learning

Aniket Rajiv Didolkar

Andrii Zadaianchuk

Michael Curtis Mozer

Georg Martius

Maximilian Seitzer

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities… (see more). Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

2024-08-17

ArXiv (preprint)