Publications

PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling

Avery Ma

Yangchen Pan

Amir-massoud Farahmand

Many-shot jailbreaking circumvents the safety alignment of large language models by exploiting their ability to process long input sequences… (voir plus). To achieve this, the malicious target prompt is prefixed with hundreds of fabricated conversational turns between the user and the model. These fabricated exchanges are randomly sampled from a pool of malicious questions and responses, making it appear as though the model has already complied with harmful instructions. In this paper, we present PANDAS: a hybrid technique that improves many-shot jailbreaking by modifying these fabricated dialogues with positive affirmations, negative demonstrations, and an optimized adaptive sampling method tailored to the target prompt's topic. Extensive experiments on AdvBench and HarmBench, using state-of-the-art LLMs, demonstrate that PANDAS significantly outperforms baseline methods in long-context scenarios. Through an attention analysis, we provide insights on how long-context vulnerabilities are exploited and show how PANDAS further improves upon many-shot jailbreaking.

2025-02-04

ArXiv (prépublication)

arxiv.org

PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling

Avery Ma

Yangchen Pan

Amir-massoud Farahmand

Many-shot jailbreaking circumvents the safety alignment of large language models by exploiting their ability to process long input sequences… (voir plus). To achieve this, the malicious target prompt is prefixed with hundreds of fabricated conversational turns between the user and the model. These fabricated exchanges are randomly sampled from a pool of malicious questions and responses, making it appear as though the model has already complied with harmful instructions. In this paper, we present PANDAS: a hybrid technique that improves many-shot jailbreaking by modifying these fabricated dialogues with positive affirmations, negative demonstrations, and an optimized adaptive sampling method tailored to the target prompt's topic. Extensive experiments on AdvBench and HarmBench, using state-of-the-art LLMs, demonstrate that PANDAS significantly outperforms baseline methods in long-context scenarios. Through an attention analysis, we provide insights on how long-context vulnerabilities are exploited and show how PANDAS further improves upon many-shot jailbreaking.

2025-02-04

ArXiv (prépublication)

doi.org

arxiv.org

Bridging Causality, Individual Fairness, and Adversarial Robustness in the Absence of Structural Causal Model

Ahmad Reza Ehyaei

Golnoosh Farnadi

Samira Samadi

Despite the essential need for comprehensive considerations in responsible AI, factors such as robustness, fairness, and causality are often… (voir plus) studied in isolation. Adversarial perturbation, used to identify vulnerabilities in models, and individual fairness, aiming for equitable treatment of similar individuals, despite initial differences, both depend on metrics to generate comparable input data instances. Previous attempts to define such joint metrics often lack general assumptions about data and were unable to reflect counterfactual proximity. To address this, our paper introduces a \emph{causal fair metric} formulated based on causal structures encompassing sensitive attributes and protected causal perturbation. To enhance the practicality of our metric, we propose metric learning as a method for metric estimation and deployment in real-world problems in the absence of structural causal models. We also demonstrate the applications of the causal fair metric in classifiers. Empirical evaluation of real-world and synthetic datasets illustrates the effectiveness of our proposed metric in achieving an accurate classifier with fairness, resilience to adversarial perturbations, and a nuanced understanding of causal relationships.

2025-02-03

TMLR (accepté)

openreview.net

Adapting Perioperative Care for Neurodivergent Children - A Scoping Review

Spandana Veeravalli

Maia Michaud

Judy Colton

Brenda Bourdeau

Samantha Sacks

Lindsay Hales

Elena Guadagno

Dan Poenaru

2025-02-01

Journal of Pediatric Surgery (publié)

doi.org

AIoT Smart Home via Autonomous LLM Agents

Dmitriy Rivkin

Francois Hogan

Amal Feriani

Abhisek Konar

Adam Sigal

Xue (Steve) Liu

Gregory Dudek

The common-sense reasoning abilities and vast general knowledge of large language models (LLMs) make them a natural fit for interpreting use… (voir plus)r requests in a smart home assistant context. LLMs, however, lack specific knowledge about the user and their home, which limits their potential impact. Smart home agent with grounded execution (SAGE), overcomes these and other limitations by using a scheme in which a user request triggers an LLM-controlled sequence of discrete actions. These actions can be used to retrieve information, interact with the user, or manipulate device states. SAGE controls this process through a dynamically constructed tree of LLM prompts, which help it decide which action to take next, whether an action was successful, and when to terminate the process. The SAGE action set augments an LLM’s capabilities to support some of the most critical requirements for a smart home assistant. These include: flexible and scalable user preference management (“Is my team playing tonight?”), access to any smart device’s full functionality without device-specific code via API reading (“Turn down the screen brightness on my dryer”), persistent device state monitoring (“Remind me to throw out the milk when I open the fridge”), natural device references using only a photo of the room (“Turn on the lamp on the dresser”), and more. We introduce a benchmark of 50 new and challenging smart home tasks where SAGE achieves a 76% success rate, significantly outperforming existing LLM-enabled baselines (30% success rate).

2025-02-01

IEEE Internet of Things Journal (publié)

doi.org

AIoT Smart Home via Autonomous LLM Agents

Dmitriy Rivkin

Francois Hogan

Amal Feriani

Abhisek Konar

Adam Sigal

Xue (Steve) Liu

Gregory Dudek

2025-02-01

IEEE Internet of Things Journal (publié)

doi.org

Application of deep reinforcement learning for intrusion detection in Internet of Things: A systematic review

Saeid Jamshidi

Amin Nikanjam

Kawser Wazed Nafi

Foutse Khomh

Rasoul Rasta

2025-02-01

Internet of Things (publié)

doi.org

arxiv.org

Divergent responses to SARS-CoV-2 infection in bronchial epithelium with pre-existing respiratory diseases

Justine Oliva

Manon Ruffin

Claire Calmel

Aurélien Gibeaud

Andrés Pizzorno

Clémence Gaudin

Solenne Chardonnet

Viviane de Almeida Bastos

Manuel Rosa-Calatrava

Antoine Soulé

Amin Emad

Simon Rousseau

Harriet Corvol

Olivier Terrier

Loïc Guillot

2025-02-01

iScience (publié)

doi.org

Isolating the impact of tissue heterogeneities in high dose rate brachytherapy treatment of the breast

Jules Faucher

Vincent Turgeon

Boris Bahoric

Shirin A. Enger

Peter G.F. Watson

2025-02-01

Physics and Imaging in Radiation Oncology (publié)

doi.org

Position: Evaluating Generative AI Systems is a Social Science Measurement Challenge

Hanna Wallach

Meera Desai

A. Feder Cooper

Angelina Wang

Chad Atalla

Solon Barocas

Su Lin Blodgett

Alexandra Chouldechova

Emily Corvi

P. A. Dow

Jean Garcia-Gathright

Alexandra Olteanu

Nicholas Pangakis

Stefanie Reed

Emily Sheng

Dan Vann

Jennifer Wortman Vaughan

Matthew Vogel

Hannah Washington

Abigail Z. Jacobs

The measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult, leading to what has been described as"a… (voir plus) tangle of sloppy tests [and] apples-to-oranges comparisons"(Roose, 2024). In this position paper, we argue that the ML community would benefit from learning from and drawing on the social sciences when developing and using measurement instruments for evaluating GenAI systems. Specifically, our position is that evaluating GenAI systems is a social science measurement challenge. We present a four-level framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, behaviors, and impacts of GenAI. This framework has two important implications for designing and evaluating evaluations: First, it can broaden the expertise involved in evaluating GenAI systems by enabling stakeholders with different perspectives to participate in conceptual debates. Second, it brings rigor to both conceptual and operational debates by offering a set of lenses for interrogating the validity of measurement instruments and their resulting measurements.

2025-02-01

ArXiv (prépublication)

doi.org

arxiv.org

A Scalable Architecture for Future Regenerative Satellite Payloads

Olfa Ben Yahia

Zineb Garroussi

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Antoine Lesage-Landry

Gunes Karabulut Kurt

This paper addresses the limitations of current satellite payload architectures, which are predominantly hardware-driven and lack the flexib… (voir plus)ility to adapt to increasing data demands and uneven traffic. To overcome these challenges, we present a novel architecture for future regenerative and programmable satellite payloads and utilize interconnected modem banks to promote higher scalability and flexibility. We formulate an optimization problem to efficiently manage traffic among these modem banks and balance the load. Additionally, we provide comparative numerical simulation results, considering end-to-end delay and packet loss analysis. The results illustrate that our proposed architecture maintains lower delays and packet loss even with higher traffic demands and smaller buffer sizes.

2025-02-01

IEEE Wireless Communications Letters (publié)

doi.org

arxiv.org

The Harmonic Exponential Filter for Nonparametric Estimation on Motion Groups

Miguel Saavedra-Ruiz

Steven A. Parkison

Ria Arora

James Richard Forbes

Liam Paull

Bayesian estimation is a vital tool in robotics as it allows systems to update the robot state belief using incomplete information from nois… (voir plus)y sensors. To render the state estimation problem tractable, many systems assume that the motion and measurement noise, as well as the state distribution, are unimodal and Gaussian. However, there are numerous scenarios and systems that do not comply with these assumptions. Existing nonparametric filters that are used to model multimodal distributions have drawbacks that limit their ability to represent a diverse set of distributions. This paper introduces a novel approach to nonparametric Bayesian filtering on motion groups, designed to handle multimodal distributions using harmonic exponential distributions. This approach leverages two key insights of harmonic exponential distributions: a) the product of two distributions can be expressed as the element-wise addition of their log-likelihood Fourier coefficients, and b) the convolution of two distributions can be efficiently computed as the tensor product of their Fourier coefficients. These observations enable the development of an efficient and asymptotically exact solution to the Bayes filter up to the band limit of a Fourier transform. We demonstrate our filter's performance compared with established nonparametric filtering methods across simulated and real-world localization tasks.

2025-02-01

IEEE Robotics and Automation Letters (publié)

doi.org

arxiv.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications