Publications

Understanding (Un)Reliability of Steering Vectors in Language Models

Joschka Braun

Carsten Eickhoff

David M. Krueger

Seyed Ali Bahrainian

Dmitrii Krasheninnikov

Steering vectors are a lightweight method to control language model behavior by adding a learned bias to the activations at inference time. … (see more)Although steering demonstrates promising performance, recent work shows that it can be unreliable or even counterproductive in some cases. This paper studies the influence of prompt types and the geometry of activation differences on steering reliability. First, we find that all seven prompt types used in our experiments produce a net positive steering effect, but exhibit high variance across samples, and often give an effect opposite of the desired one. No prompt type clearly outperforms the others, and yet the steering vectors resulting from the different prompt types often differ directionally (as measured by cosine similarity). Second, we show that higher cosine similarity between training set activation differences predicts more effective steering. Finally, we observe that datasets where positive and negative activations are better separated are more steerable. Our results suggest that vector steering is unreliable when the target behavior is not represented by a coherent direction.

2025-03-04

ICLR.cc/2025/Workshop/Bi-Align (poster)

Alireza Dehghanpour Farashah

UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS

Aditi Khandelwal

Negar Rostamzadeh

Golnoosh Farnadi

As multilingual generative models become more widely used, most safety and fairness evaluation techniques still focus on English-language re… (see more)sources, while overlooking important cross-cultural factors. This limitation raises concerns about fairness and safety, particularly regarding geoculturally situated stereotypes that hinder the models’ global inclusivity. In this work, we present preliminary findings on the impact of stereotype unlearning across languages, specifically in English, French, and Hindi. Using an adapted version of the SeeGULL dataset, we analyze how unlearning stereotypes in one language influences other languages within multilingual large language models. Our study evaluates two model families, Llama-3.1-8B and Aya-Expanse-8B, to assess whether unlearning in one linguistic context transfers across languages, potentially mitigating or exacerbating biases in multilingual settings.

2025-03-04

ICLR.cc/2025/Workshop/BuildingTrust (accepted)

Augmented Conditioning is Enough for Effective Training Image Generation

Jiahui Chen

Amy Zhang

Adriana Romero

Image generation abilities of text-to-image diffusion models have significantly advanced, yielding highly photo-realistic images from descri… (see more)ptive text and increasing the viability of leveraging synthetic images to train computer vision models. To serve as effective training data, generated images must be highly realistic while also sufficiently diverse within the support of the target data distribution. Yet, state-of-the-art conditional image generation models have been primarily optimized for creative applications, prioritizing image realism and prompt adherence over conditional diversity. In this paper, we investigate how to improve the diversity of generated images with the goal of increasing their effectiveness to train downstream image classification models, without fine-tuning the image generation model. We find that conditioning the generation process on an augmented real image and text prompt produces generations that serve as effective synthetic datasets for downstream training. Conditioning on real training images contextualizes the generation process to produce images that are in-domain with the real image distribution, while data augmentations introduce visual diversity that improves the performance of the downstream classifier. We validate augmentation-conditioning on a total of five established long-tail and few-shot im- age classification benchmarks and show that leveraging augmentations to condition the generation process results in consistent improvements over the state-of-the-art on the long-tailed benchmark and remarkable gains in extreme few-shot regimes of the remaining four benchmarks. These results constitute an important step towards effectively leveraging synthetic data for downstream training.

2025-03-03

ICLR.cc/2025/Workshop/SynthData (published)

AI Automatons: AI Systems Intended to Imitate Humans

A.R. Olteanu

Solon Barocas

Su Lin Blodgett

Lisa Egede

Alicia DeVrio

Myra Cheng

There is a growing proliferation of AI systems designed to mimic people's behavior, work, abilities, likenesses, or humanness -- systems we … (see more)dub AI automatons. Individuals, groups, or generic humans are being simulated to produce creative work in their styles, to respond to surveys in their places, to probe how they would use a new system before deployment, to provide users with assistance and companionship, and to anticipate their possible future behavior and interactions with others, just to name a few applications. The research, design, deployment, and availability of such AI systems have, however, also prompted growing concerns about a wide range of possible legal, ethical, and other social impacts. To both 1) facilitate productive discussions about whether, when, and how to design and deploy such systems, and 2) chart the current landscape of existing and prospective AI automatons, we need to tease apart determinant design axes and considerations that can aid our understanding of whether and how various design choices along these axes could mitigate -- or instead exacerbate -- potential adverse impacts that the development and use of AI automatons could give rise to. In this paper, through a synthesis of related literature and extensive examples of existing AI systems intended to mimic humans, we develop a conceptual framework to help foreground key axes of design variations and provide analytical scaffolding to foster greater recognition of the design choices available to developers, as well as the possible ethical implications these choices might have.

2025-03-03

ArXiv (preprint)

Considerations and recommendations from the ISMRM Diffusion Study Group for preclinical diffusion MRI: Part 2 - Ex vivo imaging: added value and acquisition

Kurt G. Schilling

Francesco Grussu

Andrada Ianus

Brian Hansen

Amy F. D. Howard

Rachel L. C. Barrett

Manisha Aggarwal

Stijn Michielse

Fatima Nasrallah

Warda Syeda

Nian Wang

Jelle Veraart

Alard Roebroeck

Andrew F. Bagdasarian

Cornelius Eichner

Farshid Sepehrband

Jan Zimmermann

Lucas Soustelle

Christien Bowman

Benjamin C. Tendler … (see 38 more)

Andreea Hertanu

Ben Jeurissen

Marleen Verhoye

Lucio Frydman

Yohan van de Looij

David Hike

Jeff F. Dunn

Karla Miller

Bennett A. Landman

Noam Shemesh

Adam Anderson

Emilie McKinnon

Shawna Farquharson

Flavio Dell'Acqua

Carlo Pierpaoli

Ivana Drobnjak

Alexander Leemans

Kevin D. Harkins

Maxime Descoteaux

Duan Xu

Hao Huang

Mathieu D. Santin

Samuel C. Grant

Andre Obenaus

Gene S. Kim

Dan Wu

Denis Le Bihan

Stephen J. Blackband

Luisa Ciobanu

Els Fieremans

Ruiliang Bai

Trygve B. Leergaard

Jiangyang Zhang

Tim B. Dyrby

G. Allan Johnson

Julien Cohen‐Adad

Matthew D. Budde

Ileana O. Jelescu

The value of preclinical diffusion MRI (dMRI) is substantial. While dMRI enables in vivo non-invasive characterization of tissue, ex vivo dM… (see more)RI is increasingly used to probe tissue microstructure and brain connectivity. Ex vivo dMRI has several experimental advantages including higher signal-to-noise ratio and spatial resolution compared to in vivo studies, and enabling more advanced diffusion contrasts. Another major advantage of ex vivo dMRI is the direct comparison with histological data as a methodological validation. However, there are a number of considerations that must be made when performing ex vivo experiments. The steps from tissue preparation, image acquisition and processing, and interpretation of results are complex, with decisions that not only differ dramatically from in vivo imaging of small animals, but ultimately affect what questions can be answered using the data. This work represents "Part 2" of a 3-part series of recommendations and considerations for preclinical dMRI. We describe best practices for dMRI of ex vivo tissue, with a focus on the value that ex vivo imaging adds to the field of dMRI and considerations in ex vivo image acquisition. We give general considerations and foundational knowledge that must be considered when designing experiments. We describe differences in specimens and models and discuss why some may be more or less appropriate for different studies. We then give guidelines for ex vivo protocols, including tissue fixation, sample preparation, and MR scanning. In each section, we attempt to provide guidelines and recommendations, but also highlight areas for which no guidelines exist (and why), and where future work should lie. An overarching goal herein is to enhance the rigor and reproducibility of ex vivo dMRI acquisitions and analyses, and thereby advance biomedical knowledge.

2025-03-03

Magnetic Resonance in Medicine (published)

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

Diego Velazquez

Pau Rodríguez

Sergio Alonso

Josep M. Gonfaus

Jordi Gonzalez

Gerardo Richarte

Javier Marin

Yoshua Bengio

Alexandre Lacoste

This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhanc… (see more)e deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.

2025-03-03

2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) (published)

Hardware Synthesizable Exceptions using Continuations

Paul Teng

Christophe Dubach

2025-03-03

Proceedings of the 30th Asia and South Pacific Design Automation Conference (published)

LLM-Safety Evaluations Lack Robustness

Tim Beyer

Sophie Xhonneux

Simon Geisler

Gauthier Gidel

Leo Schwinn

Stephan Günnemann

In this paper, we argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of… (see more) noise, such as small datasets, methodological inconsistencies, and unreliable evaluation setups. This can, at times, make it impossible to evaluate and compare attacks and defenses fairly, thereby slowing progress. We systematically analyze the LLM safety evaluation pipeline, covering dataset curation, optimization strategies for automated red-teaming, response generation, and response evaluation using LLM judges. At each stage, we identify key issues and highlight their practical impact. We also propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers. Lastly, we offer an opposing perspective, highlighting practical reasons for existing limitations. We believe that addressing the outlined problems in future research will improve the field's ability to generate easily comparable results and make measurable progress.

2025-03-03

ArXiv (preprint)

CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning

Prashant Govindarajan

Mathieu Reymond

Antoine Clavaud

Mariano Phielipp

Santiago Miret

A. Chandar

*In silico* design and optimization of new materials primarily relies on high-accuracy atomic simulators that perform density functional the… (see more)ory (DFT) calculations. While recent works showcase the strong potential of machine learning to accelerate the material design process, they mostly consist of generative approaches that do not use direct DFT signals as feedback to improve training and generation mainly due to DFT's high computational cost. To aid the adoption of direct DFT signals in the materials design loop through online reinforcement learning (RL), we propose **CrystalGym**, an open-source RL environment for crystalline material discovery. Using CrystalGym, we benchmark value- and policy-based reinforcement learning algorithms for designing various crystals conditioned on target properties. Concretely, we optimize for challenging properties like the band gap, bulk modulus, and density, which are directly calculated from DFT in the environment. While none of the algorithms we benchmark solve all CrystalGym tasks, our extensive experiments and ablations show different sample efficiencies and ease of convergence to optimality for different algorithms and environment settings. Our goal is for CrystalGym to serve as a test bed for reinforcement learning researchers and material scientists to address these real-world design problems with practical applications. Furthermore, we introduce a novel class of challenges for reinforcement learning methods dealing with time-consuming reward signals, paving the way for future interdisciplinary research for machine learning motivated by real-world applications.

2025-03-02

AI4MAT @ International Conference on Learning Representations (spotlight)

Development and Feasibility Study of HOPE Model for Prediction of Depression Among Older Adults Using Wi-Fi-based Motion Sensor Data: Machine Learning Study

Shayan Nejadshamsi

Vania Karami

Negar Ghourchian

Narges Armanfard

Howard Bergman

Roland Grad

Machelle Wilchesky

Vladimir Khanassov

Isabelle Vedel

Samira Abbasgholizadeh Rahimi

Depression, characterized by persistent sadness and loss of interest in daily activities, greatly reduces quality of life. Early detection i… (see more)s vital for effective treatment and intervention. While many studies use wearable devices to classify depression based on physical activity, these often rely on intrusive methods. Additionally, most depression classification studies involve large participant groups and use single-stage classifiers without explainability. This study aims to assess the feasibility of classifying depression using nonintrusive Wi-Fi–based motion sensor data using a novel machine learning model on a limited number of participants. We also conduct an explainability analysis to interpret the model’s predictions and identify key features associated with depression classification. In this study, we recruited adults aged 65 years and older through web-based and in-person methods, supported by a McGill University health care facility directory. Participants provided consent, and we collected 6 months of activity and sleep data via nonintrusive Wi-Fi–based sensors, along with Edmonton Frailty Scale and Geriatric Depression Scale data. For depression classification, we proposed a HOPE (Home-Based Older Adults’ Depression Prediction) machine learning model with feature selection, dimensionality reduction, and classification stages, evaluating various model combinations using accuracy, sensitivity, precision, and F1-score. Shapely addictive explanations and local interpretable model-agnostic explanations were used to explain the model’s predictions. A total of 6 participants were enrolled in this study; however, 2 participants withdrew later due to internet connectivity issues. Among the 4 remaining participants, 3 participants were classified as not having depression, while 1 participant was identified as having depression. The most accurate classification model, which combined sequential forward selection for feature selection, principal component analysis for dimensionality reduction, and a decision tree for classification, achieved an accuracy of 87.5%, sensitivity of 90%, and precision of 88.3%, effectively distinguishing individuals with and those without depression. The explainability analysis revealed that the most influential features in depression classification, in order of importance, were “average sleep duration,” “total number of sleep interruptions,” “percentage of nights with sleep interruptions,” “average duration of sleep interruptions,” and “Edmonton Frailty Scale.” The findings from this preliminary study demonstrate the feasibility of using Wi-Fi–based motion sensors for depression classification and highlight the effectiveness of our proposed HOPE machine learning model, even with a small sample size. These results suggest the potential for further research with a larger cohort for more comprehensive validation. Additionally, the nonintrusive data collection method and model architecture proposed in this study offer promising applications in remote health monitoring, particularly for older adults who may face challenges in using wearable devices. Furthermore, the importance of sleep patterns identified in our explainability analysis aligns with findings from previous research, emphasizing the need for more in-depth studies on the role of sleep in mental health, as suggested in the explainable machine learning study.

2025-03-02

JMIR Aging (published)

Interval Regression: A Comparative Study with Proposed Models

Tung L. Nguyen

Toby Dylan Hocking

Regression models are essential for a wide range of real-world applications. However, in practice, target values are not always precisely kn… (see more)own; instead, they may be represented as intervals of acceptable values. This challenge has led to the development of Interval Regression models. In this study, we provide a comprehensive review of existing Interval Regression models and introduce alternative models for comparative analysis. Experiments are conducted on both real-world and synthetic datasets to offer a broad perspective on model performance. The results demonstrate that no single model is universally optimal, highlighting the importance of selecting the most suitable model for each specific scenario.

2025-03-02

ArXiv (preprint)