Publications

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Juho Lee

Sung Ju Hwang

Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of lar… (see more)ge language models (LLMs). Developing effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that elicit undesirable responses from a target LLM, as measured, for example, by an auxiliary toxicity classifier. We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks. As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts. We find that the attacks generated by our method are effective against a wide range of target LLMs, both with and without safety tuning, and transfer well between target LLMs. Finally, we demonstrate that models safety-tuned using a dataset of red-teaming prompts generated by our method are robust to attacks from other RL-based red-teaming approaches.

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

MaestroMotif: Skill Design From Artificial Intelligence Feedback

Martin Klissarov

Mikael Henaff

Roberta Raileanu

Marlos C. Machado

Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an… (see more) AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.

2025-01-21

ICLR.cc/2025/Conference (oral)

doi.org

openreview.net

MAP: Low-Compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Zhiqi Bu

Huan He

Yonghui Wu

Jiang Bian

Yong Chen

Yoshua Bengio

Model merging has emerged as an effective approach to combine multiple single-task models into a multitask model. This process typically inv… (see more)olves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during the merging process. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP efficiently identifies a Pareto set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. It amortizes the substantial computational cost of evaluations needed to estimate the Pareto front by using quadratic approximation surrogate models derived from a pre-selected set of scaling coefficients. Experimental results on vision and natural language processing tasks demonstrate that MAP can accurately identify the Pareto front, providing practitioners with flexible solutions to balance competing task objectives. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

MatExpert: Decomposing Materials Discovery by Mimicking Human Experts

Qianggang Ding

Santiago Miret

Bang Liu

Material discovery is a critical research area with profound implications for various industries. In this work, we introduce MatExpert, a no… (see more)vel framework that leverages Large Language Models (LLMs) and contrastive learning to accelerate the discovery and design of new solid-state materials. Inspired by the workflow of human materials design experts, our approach integrates three key stages: retrieval, transition, and generation. First, in the retrieval stage, MatExpert identifies an existing material that closely matches the desired criteria. Second, in the transition stage, MatExpert outlines the necessary modifications to transform this material formulation to meet specific requirements outlined by the initial user query. Third, in the generation state, MatExpert performs detailed computations and structural generation to create new materials based on the provided information. Our experimental results demonstrate that MatExpert outperforms state-of-the-art methods in material generation tasks, achieving superior performance across various metrics including validity, distribution, and stability. As such, MatExpert represents a meaningful advancement in computational material discovery using langauge-based generative models.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic

Xi Zhang

Brandon Amos

Mathieu Blanchette

Leo J. Lee

Yoshua Bengio

Alexander Tong

Kirill Neklyudov

Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynam… (see more)ics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depend on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrate along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions, unlike previously proposed methods. We demonstrate the ability of MFM to improve the prediction of individual treatment responses on a large-scale multi-patient single-cell drug screen dataset.

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

MMTEB: Massive Multilingual Text Embedding Benchmark

Kenneth Enevoldsen

Isaac Chung

Imene Kerboua

Márton Kardos

Ashwin Mathur

David Stap

Jay Gala

Wissam Siblini

Dominik Krzemiński

Genta Indra Winata

Saba Sturua

Saiteja Utpala

Mathieu Ciancone

Marion Schaeffer

Gabriel Sequeira

Diganta Misra

Shreeya Dhakal

Jonathan Rystrøm

Roman Solomatin

Ömer Çağatan … (see 66 more)

Akash Kundu

Martin Bernstorff

Shitao Xiao

Akshita Sukhlecha

Bhavish Pahwa

Rafał Poświata

Kranthi Kiran GV

Shawon Ashraf

Daniel Auras

Björn Plüster

Jan Philipp Harries

Loïc Magne

Isabelle Mohr

Mariya Hendriksen

Dawei Zhu

Hippolyte Gisserot-Boukhlef

Tom Aarsen

Jan Kostkan

Konrad Wojtasik

Taemin Lee

Marek Šuppa

Crystina Zhang

Roberta Rocca

Mohammed Hamdy

Andrianos Michail

John Yang

Manuel Faysse

Aleksei Vatolin

Nandan Thakur

Manan Dey

Dipam Vasani

Pranjal Chitale

Simone Tedeschi

Nguyen Tai

Artem Snegirev

Michael Günther

Mengzhou Xia

Weijia Shi

Xing Han Lu

Jordan Clive

Gayatri Krishnakumar

Anna Maksimova

Silvan Wehrli

Maria Tikhonova

Henil Panchal

Aleksandr Abramov

Malte Ostendorff

Zheng Liu

Simon Clematide

Lester James Miranda

Alena Fenogenova

Guangyu Song

Ruqiya Bin Safi

Wen-Ding Li

Alessia Borghini

Federico Cassano

Hongjin Su

Jimmy Lin

Howard Yen

Lasse Hansen

Sara Hooker

Chenghao Xiao

Vaibhav Adlakha

Orion Weller

Siva Reddy

Niklas Muennighoff

Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address… (see more) these limitations and provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) - a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ languages. MMTEB includes a diverse set of challenging, novel tasks such as instruction following, long-document retrieval, and code retrieval, representing the largest multilingual collection of evaluation tasks for embedding models to date. Using this collection, we develop several highly multilingual benchmarks, which we use to evaluate a representative set of models. We find that while large language models (LLMs) with billions of parameters can achieve state-of-the-art performance on certain language subsets and task categories, the best-performing publicly available model is multilingual-e5-large-instruct with only 560 million parameters. To facilitate accessibility and reduce computational cost, we introduce a novel downsampling method based on inter-task correlation, ensuring a diverse selection while preserving relative model rankings. Furthermore, we optimize tasks such as retrieval by sampling hard negatives, creating smaller but effective splits. These optimizations allow us to introduce benchmarks that drastically reduce computational demands. For instance, our newly introduced zero-shot English benchmark maintains a ranking order similar to the full-scale version but at a fraction of the computational cost.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

MODL: Multilearner Online Deep Learning

Antonios Valkanas

Boris Oreshkin

Mark J. Coates

Online deep learning solves the problem of learning from streams of data, reconciling two opposing objectives: learn fast and learn deep. Ex… (see more)isting work focuses almost exclusively on exploring pure deep learning solutions, which are much better suited to handle the"deep"than the"fast"part of the online learning equation. In our work, we propose a different paradigm, based on a hybrid multilearner approach. First, we develop a fast online logistic regression learner. This learner does not rely on backpropagation. Instead, it uses closed form recursive updates of model parameters, handling the fast learning part of the online learning problem. We then analyze the existing online deep learning theory and show that the widespread ODL approach, currently operating at complexity

2025-01-21

aistats.org/AISTATS/2025/Conference (poster)

doi.org

proceedings.mlr.press

Multi-Agent Cooperation Through Learning-Aware Policy Gradients

Alexander Meulemans

Seijin Kobayashi

Johannes Von Oswald

Nino Scherrer

Eric Elmoznino

Blake A. Richards

Guillaume Lajoie

Blaise Agüera y Arcas

João Sacramento

Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. How can we achieve cooperation… (see more) among self-interested, independent learning agents? Promising recent work has shown that in certain tasks cooperation can be established between learning-aware agents who model the learning dynamics of each other. Here, we present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning, which takes into account that other agents are themselves learning through trial and error based on multiple noisy trials. We then leverage efficient sequence models to condition behavior on long observation histories that contain traces of the learning dynamics of other agents. Training long-context policies with our algorithm leads to cooperative behavior and high returns on standard social dilemmas, including a challenging environment where temporally-extended action coordination is required. Finally, we derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Multi-Session, Multi-Task Neural Decoding from Distinct Cell-Types and Brain Regions

Mehdi Azabou

Krystal Xuejing Pan

Vinam Arora

Ian Knight

Eva L. Dyer

Blake Richards

Recent work has shown that scale is important for improved brain decoding, with more data leading to greater decoding accuracy. However, lar… (see more)ge-scale decoding across many different datasets is challenging because neural circuits are heterogeneous---each brain region contains a unique mix of cellular sub-types, and the responses to different stimuli are diverse across regions and sub-types. It is unknown whether it is possible to pre-train and transfer brain decoding models between distinct tasks, cellular sub-types, and brain regions. To address these questions, we developed a multi-task transformer architecture and trained it on the entirety of the Allen Institute's Brain Observatory dataset. This dataset contains responses from over 100,000 neurons in 6 areas of the brains of mice, observed with two-photon calcium imaging, recorded while the mice observed different types of visual stimuli. Our results demonstrate that transfer is indeed possible -combining data from different sources is beneficial for a number of downstream decoding tasks. As well, we can transfer the model between regions and sub-types, demonstrating that there is in fact common information in diverse circuits that can be extracted by an appropriately designed model. Interestingly, we found that the model's latent representations showed clear distinctions between different brain regions and cellular sub-types, even though it was never given any information about these distinctions. Altogether, our work demonstrates that training a large-scale neural decoding model on diverse data is possible, and this provides a means of studying the differences and similarities between heterogeneous neural circuits.

2025-01-21

ICLR.cc/2025/Conference (spotlight)

openreview.net

Neuroplastic Expansion in Deep Reinforcement Learning

Jiashun Liu

Johan Obando-Ceron

Aaron Courville

Ling Pan

The loss of plasticity in learning agents, analogous to the solidification of neural pathways in biological brains, significantly impedes le… (see more)arning and adaptation in reinforcement learning due to its non-stationary nature. To address this fundamental challenge, we propose a novel approach, {\it Neuroplastic Expansion} (NE), inspired by cortical expansion in cognitive science. NE maintains learnability and adaptability throughout the entire training process by dynamically growing the network from a smaller initial size to its full dimension. Our method is designed with three key components: (\textit{1}) elastic topology generation based on potential gradients, (\textit{2}) dormant neuron pruning to optimize network expressivity, and (\textit{3}) neuron consolidation via experience review to strike a balance in the plasticity-stability dilemma. Extensive experiments demonstrate that NE effectively mitigates plasticity loss and outperforms state-of-the-art methods across various tasks in MuJoCo and DeepMind Control Suite environments. NE enables more adaptive learning in complex, dynamic environments, which represents a crucial step towards transitioning deep reinforcement learning from static, one-time training paradigms to more flexible, continually adapting models.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Sanjiban Choudhury

In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Tradit… (see more)ionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimizes the reward through repeated RL procedures. This game-solving approach is both computationally expensive and difficult to stabilize. In this work, we propose a novel approach to IRL by direct policy optimization: exploiting a linear factorization of the return as the inner product of successor features and a reward vector, we design an IRL algorithm by policy gradient descent on the gap between the learner and expert features. Our non-adversarial method does not require learning a reward function and can be solved seamlessly with existing actor-critic RL algorithms. Remarkably, our approach works in state-only settings without expert action labels, a setting which behavior cloning (BC) cannot solve. Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks.

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

Optimizing Return Distributions with Distributional Dynamic Programming

Bernardo Avila Pires

Mark Rowland

Diana Borsa

Zhaohan Daniel Guo

Khimya Khetarpal

Andre Barreto

David Abel

Remi Munos

Will Dabney

We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standar… (see more)d reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step). We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we combine the core ideas of distributional value iteration with the deep RL agent DQN, and empirically evaluate it for solving instances of the applications discussed.

2025-01-21

ArXiv (preprint)

doi.org

arxiv.org

Mila on Udemy

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Publications

Mila on Udemy

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Popular keywords:

Publications