Publications

Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

Eleni Triantafillou

Peter Kairouz

Fabian Pedregosa

Jamie Hayes

Meghdad Kurmanji

Kairan Zhao

Vincent Dumoulin

Julio C. S. Jacques Junior

Ioannis Mitliagkas

Jun Wan

Lisheng Sun-Hosoya

Sergio Escalera

Gintare Karolina Dziugaite

Peter Triantafillou

Isabelle Guyon

We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and in… (see more)itiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In this paper, we analyze top solutions and delve into discussions on benchmarking unlearning, which itself is a research problem. The evaluation methodology we developed for the competition measures forgetting quality according to a formal notion of unlearning, while incorporating model utility for a holistic evaluation. We analyze the effectiveness of different instantiations of this evaluation framework vis-a-vis the associated compute cost, and discuss implications for standardizing evaluation. We find that the ranking of leading methods remains stable under several variations of this framework, pointing to avenues for reducing the cost of evaluation. Overall, our findings indicate progress in unlearning, with top-performing competition entries surpassing existing algorithms under our evaluation framework. We analyze trade-offs made by different algorithms and strengths or weaknesses in terms of generalizability to new datasets, paving the way for advancing both benchmarking and algorithm development in this important area.

2024-06-13

ArXiv (preprint)

doi.org

arxiv.org

Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

Eleni Triantafillou

Peter Kairouz

Fabian Pedregosa

Jamie Hayes

Meghdad Kurmanji

Kairan Zhao

Vincent Dumoulin

Julio C. S. Jacques Junior

Ioannis Mitliagkas

Jun Wan

Lisheng Sun-Hosoya

Sergio Escalera

Gintare Karolina Dziugaite

Peter Triantafillou

Isabelle Guyon

We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and in… (see more)itiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In this paper, we analyze top solutions and delve into discussions on benchmarking unlearning, which itself is a research problem. The evaluation methodology we developed for the competition measures forgetting quality according to a formal notion of unlearning, while incorporating model utility for a holistic evaluation. We analyze the effectiveness of different instantiations of this evaluation framework vis-a-vis the associated compute cost, and discuss implications for standardizing evaluation. We find that the ranking of leading methods remains stable under several variations of this framework, pointing to avenues for reducing the cost of evaluation. Overall, our findings indicate progress in unlearning, with top-performing competition entries surpassing existing algorithms under our evaluation framework. We analyze trade-offs made by different algorithms and strengths or weaknesses in terms of generalizability to new datasets, paving the way for advancing both benchmarking and algorithm development in this important area.

2024-06-13

ArXiv (preprint)

doi.org

arxiv.org

Exploring validation metrics for offline model-based optimisation with diffusion models

2024-06-13

TMLR (accepted)

openreview.net

Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos

Qingyuan Liu

Pengyuan Shi

Yun-Yun Tsai

Chengzhi Mao

Junfeng Yang

2024-06-13

ArXiv (preprint)

doi.org

arxiv.org

Grounding Multimodal Large Language Models in Actions

Andrew Szot

Bogdan Mazoure

Harsh Agrawal

(Rex) Devon Hjelm

Zsolt Kira

Alexander T Toshev

2024-06-12

ArXiv (preprint)

doi.org

arxiv.org

Grounding Multimodal Large Language Models in Actions

Andrew Szot

Bogdan Mazoure

Harsh Agrawal

(Rex) Devon Hjelm

Zsolt Kira

Alexander T Toshev

Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this … (see more)work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens of action space adaptors. For continuous actions, we show that a learned tokenization allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions, we demonstrate that semantically aligning these actions with the native output token space of the MLLM leads to the strongest performance. We arrive at these lessons via a thorough study of seven action space adapters on five different environments, encompassing over 114 embodied tasks.

2024-06-12

ArXiv (preprint)

doi.org

arxiv.org

PathOCL: Path-Based Prompt Augmentation for OCL Generation with GPT-4

Seif Abukhalaf

Mohammad Hamdaqa

Foutse Khomh

The rapid progress of AI-powered programming assistants, such as GitHub Copilot, has facilitated the development of software applications. T… (see more)hese assistants rely on large language models (LLMs), which are foundation models (FMs) that support a wide range of tasks related to understanding and generating language. LLMs have demonstrated their ability to express UML model specifications using formal languages like the Object Constraint Language (OCL). However, the context size of the prompt is limited by the number of tokens an LLM can process. This limitation becomes significant as the size of UML class models increases. In this study, we introduce PathOCL, a novel path-based prompt augmentation technique designed to facilitate OCL generation. PathOCL addresses the limitations of LLMs, specifically their token processing limit and the challenges posed by large UML class models. PathOCL is based on the concept of chunking, which selectively augments the prompts with a subset of UML classes relevant to the English specification. Our findings demonstrate that PathOCL, compared to augmenting the complete UML class model (UML-Augmentation), generates a higher number of valid and correct OCL constraints using the GPT-4 model. Moreover, the average prompt size crafted using PathOCL significantly decreases when scaling the size of the UML class models.

2024-06-12

Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (published)

doi.org

arxiv.org

GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews

Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to confere… (see more)nces has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews. Unlike traditional consensus-based methods, \sys extracts both common and unique opinions from the reviews. We introduce novel uniqueness scores based on the Rational Speech Act framework to identify relevant sentences in the reviews. Our method aims to provide a pragmatic glimpse into all reviews, offering a balanced perspective on their opinions. Our experimental results with both automatic metrics and human evaluation show that \sys generates more discriminative summaries than baseline methods in terms of human evaluation while achieving comparable performance with these methods in terms of automatic metrics.

2024-06-11

ArXiv (preprint)

doi.org

arxiv.org

Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems

Heiko Hoppe

Tobias Enders

Quentin Cappart

Maximilian Schiffer

We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests… (see more) or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.

2024-06-11

Proceedings of the 6th Annual Learning for Dynamics & Control Conference (published)

doi.org

arxiv.org

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Zhiqi Bu

Huan He

Yonghui Wu

Jiang Bian

Yong Chen

Yoshua Bengio

Model merging has emerged as an effective approach to combine multiple single-task models into a multitask model. This process typically inv… (see more)olves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during the merging process. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP efficiently identifies a Pareto set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. It amortizes the substantial computational cost of evaluations needed to estimate the Pareto front by using quadratic approximation surrogate models derived from a pre-selected set of scaling coefficients. Experimental results on vision and natural language processing tasks demonstrate that MAP can accurately identify the Pareto front, providing practitioners with flexible solutions to balance competing task objectives. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.

2024-06-11

ArXiv (preprint)

doi.org

arxiv.org

MINERS: Multilingual Language Models as Semantic Retrievers

Genta Indra Winata

Ruochen Zhang

David Ifeoluwa Adelani

Words have been represented in a high-dimensional vector space that encodes their semantic similarities, enabling downstream applications su… (see more)ch as retrieving synonyms, antonyms, and relevant contexts. However, despite recent advances in multilingual language models (LMs), the effectiveness of these models' representations in semantic retrieval contexts has not been comprehensively explored. To fill this gap, this paper introduces the MINERS, a benchmark designed to evaluate the ability of multilingual LMs in semantic retrieval tasks, including bitext mining and classification via retrieval-augmented contexts. We create a comprehensive framework to assess the robustness of LMs in retrieving samples across over 200 diverse languages, including extremely low-resource languages in challenging cross-lingual and code-switching settings. Our results demonstrate that by solely retrieving semantically similar embeddings yields performance competitive with state-of-the-art approaches, without requiring any fine-tuning.

2024-06-11

ArXiv (preprint)

doi.org

arxiv.org

When is an Embedding Model More Promising than Another?

Ismail Ben Ayed

Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to p… (see more)erform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.

2024-06-11

ArXiv (preprint)

doi.org

arxiv.org

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Publications

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Popular keywords:

Publications