Publications

Scaling Deep Learning Solutions for Transition Path Sampling

Jungyoon Lee

Michael Plainer

Yuanqi Du

Lars Holdijk

Rob Brekelmans

Carla P Gomes

Dominique Beaini

Kirill Neklyudov

Transition path sampling (TPS) is an important method for studying rare events, such as they happen in chemical reactions or protein folding… (voir plus). These events occur so infrequently that traditional simulations are often impractical, and even recent machine-learning approaches struggle to address this issue for larger systems. In this paper, we propose using modern deep learning techniques to improve the scalability of TPS methods significantly. We highlight the need for better evaluations in the existing literature and start by formulating TPS as a sampling problem over an unnormalized target density and introduce relevant evaluation metrics to assess the effectiveness of TPS solutions from this perspective. To develop a scalable approach, we explore several design choices, including a problem-informed neural network architecture, simulated annealing, the integration of prior knowledge into the sampling process, and attention mechanisms. Finally, we conduct a comprehensive empirical study and compare these design choices with other recently developed deep-learning methods for rare event sampling.

2025-03-04

ICLR.cc/2025/Workshop/GEM (publié)

openreview.net

Societal Alignment Frameworks Can Improve LLM Alignment

Karolina Stanczak

Nicholas Meade

Mehar Bhatia

Hattie Zhou

Konstantin Böttinger

Jeremy Barnes

Jason Stanley

Jessica Montgomery

Richard Zemel

Nicolas Papernot

Nicolas Chapados

Denis Therien

Timothy P Lillicrap

Ana Marasovic

Sylvie Delacroix

Gillian K. Hadfield

Siva Reddy

Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values… (voir plus) - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.

2025-03-04

Bi-Align @ International Conference on Learning Representations (poster)

doi.org

openreview.net

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia

Alan Ansell

Edoardo Ponti

Ivan Vulić

Alessandro Sordoni

2025-03-04

ICLR.cc/2025/Workshop/MCDC (accepté)

doi.org

openreview.net

Understanding (Un)Reliability of Steering Vectors in Language Models

Joschka Braun

Carsten Eickhoff

David M. Krueger

Seyed Ali Bahrainian

Dmitrii Krasheninnikov

Steering vectors are a lightweight method to control language model behavior by adding a learned bias to the activations at inference time. … (voir plus)Although steering demonstrates promising performance, recent work shows that it can be unreliable or even counterproductive in some cases. This paper studies the influence of prompt types and the geometry of activation differences on steering reliability. First, we find that all seven prompt types used in our experiments produce a net positive steering effect, but exhibit high variance across samples, and often give an effect opposite of the desired one. No prompt type clearly outperforms the others, and yet the steering vectors resulting from the different prompt types often differ directionally (as measured by cosine similarity). Second, we show that higher cosine similarity between training set activation differences predicts more effective steering. Finally, we observe that datasets where positive and negative activations are better separated are more steerable. Our results suggest that vector steering is unreliable when the target behavior is not represented by a coherent direction.

2025-03-04

ICLR.cc/2025/Workshop/Bi-Align (poster)

doi.org

openreview.net

UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS

Alireza Dehghanpour Farashah

Aditi Khandelwal

Negar Rostamzadeh

Golnoosh Farnadi

As multilingual generative models become more widely used, most safety and fairness evaluation techniques still focus on English-language re… (voir plus)sources, while overlooking important cross-cultural factors. This limitation raises concerns about fairness and safety, particularly regarding geoculturally situated stereotypes that hinder the models’ global inclusivity. In this work, we present preliminary findings on the impact of stereotype unlearning across languages, specifically in English, French, and Hindi. Using an adapted version of the SeeGULL dataset, we analyze how unlearning stereotypes in one language influences other languages within multilingual large language models. Our study evaluates two model families, Llama-3.1-8B and Aya-Expanse-8B, to assess whether unlearning in one linguistic context transfers across languages, potentially mitigating or exacerbating biases in multilingual settings.

2025-03-04

ICLR.cc/2025/Workshop/BuildingTrust (accepté)

openreview.net

Augmented Conditioning is Enough for Effective Training Image Generation

Jiahui Chen

Amy Zhang

Adriana Romero

Image generation abilities of text-to-image diffusion models have significantly advanced, yielding highly photo-realistic images from descri… (voir plus)ptive text and increasing the viability of leveraging synthetic images to train computer vision models. To serve as effective training data, generated images must be highly realistic while also sufficiently diverse within the support of the target data distribution. Yet, state-of-the-art conditional image generation models have been primarily optimized for creative applications, prioritizing image realism and prompt adherence over conditional diversity. In this paper, we investigate how to improve the diversity of generated images with the goal of increasing their effectiveness to train downstream image classification models, without fine-tuning the image generation model. We find that conditioning the generation process on an augmented real image and text prompt produces generations that serve as effective synthetic datasets for downstream training. Conditioning on real training images contextualizes the generation process to produce images that are in-domain with the real image distribution, while data augmentations introduce visual diversity that improves the performance of the downstream classifier. We validate augmentation-conditioning on a total of five established long-tail and few-shot im- age classification benchmarks and show that leveraging augmentations to condition the generation process results in consistent improvements over the state-of-the-art on the long-tailed benchmark and remarkable gains in extreme few-shot regimes of the remaining four benchmarks. These results constitute an important step towards effectively leveraging synthetic data for downstream training.

2025-03-03

ICLR.cc/2025/Workshop/SynthData (publié)

doi.org

openreview.net

Considerations and recommendations from the ISMRM Diffusion Study Group for preclinical diffusion MRI: Part 2 - Ex vivo imaging: added value and acquisition

Kurt G Schilling

Francesco Grussu

Andrada Ianus

Brian Hansen

Manisha Aggarwal

Amy FD Howard

Rachel L C Barrett

Stijn Michielse

Fatima Nasrallah

Warda Syeda

Nian Wang

Jelle Veraart

Alard Roebroeck

Andrew F Bagdasarian

Cornelius Eichner

Farshid Sepehrband

Jan Zimmermann

Ben Jeurissen

Lucio Frydman

Yohan van de Looij … (voir 38 de plus)

Lucas Soustelle

Christien Bowman

David Hike

Benjamin C Tendler

Jeff F Dunn

Andrada Ianus

Karla Miller

Bennett A Landman

Noam Shemesh

Marleen Verhoye

Adam Anderson

Emilie McKinnon

Shawna Farquharson

Flavio Dell' Acqua

Carlo Pierpaoli

Ivana Drobnjak

Alexander Leemans

Kevin D Harkins

Maxime Descoteaux

Duan Xu

Mathieu D Santin

Samuel C. Grant

Andre Obenaus

Gene S Kim

Dan Wu

Denis Le Bihan

Stephen J Blackband

Nian Wang

Luisa Ciobanu

Els Fieremans

Ruiliang Bai

Trygve B Leergaard

Jiangyang Zhang

Tim B Dyrby

G Allan Johnson

Julien Cohen-Adad

Matthew D Budde

Ileana O Jelescu

The value of preclinical diffusion MRI (dMRI) is substantial. While dMRI enables in vivo non-invasive characterization of tissue, ex vivo dM… (voir plus)RI is increasingly used to probe tissue microstructure and brain connectivity. Ex vivo dMRI has several experimental advantages including higher signal-to-noise ratio and spatial resolution compared to in vivo studies, and enabling more advanced diffusion contrasts. Another major advantage of ex vivo dMRI is the direct comparison with histological data as a methodological validation. However, there are a number of considerations that must be made when performing ex vivo experiments. The steps from tissue preparation, image acquisition and processing, and interpretation of results are complex, with decisions that not only differ dramatically from in vivo imaging of small animals, but ultimately affect what questions can be answered using the data. This work represents "Part 2" of a 3-part series of recommendations and considerations for preclinical dMRI. We describe best practices for dMRI of ex vivo tissue, with a focus on the value that ex vivo imaging adds to the field of dMRI and considerations in ex vivo image acquisition. We give general considerations and foundational knowledge that must be considered when designing experiments. We describe differences in specimens and models and discuss why some may be more or less appropriate for different studies. We then give guidelines for ex vivo protocols, including tissue fixation, sample preparation, and MR scanning. In each section, we attempt to provide guidelines and recommendations, but also highlight areas for which no guidelines exist (and why), and where future work should lie. An overarching goal herein is to enhance the rigor and reproducibility of ex vivo dMRI acquisitions and analyses, and thereby advance biomedical knowledge.

2025-03-03

Magnetic Resonance in Medicine (publié)

doi.org

arxiv.org

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

Diego Velazquez

Pau Rodríguez

Sergio Alonso

Josep M. Gonfaus

Jordi Gonzalez

Gerardo Richarte

Javier Marin

Yoshua Bengio

Alexandre Lacoste

This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhanc… (voir plus)e deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.

2025-03-03

2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) (publié)

doi.org

arxiv.org

Hardware Synthesizable Exceptions using Continuations

Paul Teng

Christophe Dubach

2025-03-03

Proceedings of the 30th Asia and South Pacific Design Automation Conference (publié)

doi.org

CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning

Prashant Govindarajan

Mathieu Reymond

Antoine Clavaud

Mariano Phielipp

Santiago Miret

A. Chandar

*In silico* design and optimization of new materials primarily relies on high-accuracy atomic simulators that perform density functional the… (voir plus)ory (DFT) calculations. While recent works showcase the strong potential of machine learning to accelerate the material design process, they mostly consist of generative approaches that do not use direct DFT signals as feedback to improve training and generation mainly due to DFT's high computational cost. To aid the adoption of direct DFT signals in the materials design loop through online reinforcement learning (RL), we propose **CrystalGym**, an open-source RL environment for crystalline material discovery. Using CrystalGym, we benchmark value- and policy-based reinforcement learning algorithms for designing various crystals conditioned on target properties. Concretely, we optimize for challenging properties like the band gap, bulk modulus, and density, which are directly calculated from DFT in the environment. While none of the algorithms we benchmark solve all CrystalGym tasks, our extensive experiments and ablations show different sample efficiencies and ease of convergence to optimality for different algorithms and environment settings. Our goal is for CrystalGym to serve as a test bed for reinforcement learning researchers and material scientists to address these real-world design problems with practical applications. Furthermore, we introduce a novel class of challenges for reinforcement learning methods dealing with time-consuming reward signals, paving the way for future interdisciplinary research for machine learning motivated by real-world applications.

2025-03-02

ICLR.cc/2025/Workshop/AI4MAT (spotlight)

doi.org

openreview.net

Development and Feasibility Study of HOPE Model for Prediction of Depression Among Older Adults Using Wi-Fi-based Motion Sensor Data: Machine Learning Study

Shayan Nejadshamsi

Vania Karami

Negar Ghourchian

Narges Armanfard

Howard Bergman

Roland Grad

Machelle Wilchesky

Vladimir Khanassov

Isabelle Vedel

Vania Karami

Depression, characterized by persistent sadness and loss of interest in daily activities, greatly reduces quality of life. Early detection i… (voir plus)s vital for effective treatment and intervention. While many studies use wearable devices to classify depression based on physical activity, these often rely on intrusive methods. Additionally, most depression classification studies involve large participant groups and use single-stage classifiers without explainability. This study aims to assess the feasibility of classifying depression using nonintrusive Wi-Fi–based motion sensor data using a novel machine learning model on a limited number of participants. We also conduct an explainability analysis to interpret the model’s predictions and identify key features associated with depression classification. In this study, we recruited adults aged 65 years and older through web-based and in-person methods, supported by a McGill University health care facility directory. Participants provided consent, and we collected 6 months of activity and sleep data via nonintrusive Wi-Fi–based sensors, along with Edmonton Frailty Scale and Geriatric Depression Scale data. For depression classification, we proposed a HOPE (Home-Based Older Adults’ Depression Prediction) machine learning model with feature selection, dimensionality reduction, and classification stages, evaluating various model combinations using accuracy, sensitivity, precision, and F1-score. Shapely addictive explanations and local interpretable model-agnostic explanations were used to explain the model’s predictions. A total of 6 participants were enrolled in this study; however, 2 participants withdrew later due to internet connectivity issues. Among the 4 remaining participants, 3 participants were classified as not having depression, while 1 participant was identified as having depression. The most accurate classification model, which combined sequential forward selection for feature selection, principal component analysis for dimensionality reduction, and a decision tree for classification, achieved an accuracy of 87.5%, sensitivity of 90%, and precision of 88.3%, effectively distinguishing individuals with and those without depression. The explainability analysis revealed that the most influential features in depression classification, in order of importance, were “average sleep duration,” “total number of sleep interruptions,” “percentage of nights with sleep interruptions,” “average duration of sleep interruptions,” and “Edmonton Frailty Scale.” The findings from this preliminary study demonstrate the feasibility of using Wi-Fi–based motion sensors for depression classification and highlight the effectiveness of our proposed HOPE machine learning model, even with a small sample size. These results suggest the potential for further research with a larger cohort for more comprehensive validation. Additionally, the nonintrusive data collection method and model architecture proposed in this study offer promising applications in remote health monitoring, particularly for older adults who may face challenges in using wearable devices. Furthermore, the importance of sleep patterns identified in our explainability analysis aligns with findings from previous research, emphasizing the need for more in-depth studies on the role of sleep in mental health, as suggested in the explainable machine learning study.

2025-03-02

JMIR Aging (publié)

doi.org

A physics-based data-driven model for CO$_2$ gas diffusion electrodes to drive automated laboratories

Ivan Grega

Félix Therrien

Abhishek Soni

Karry Ocean

Kevan Dettelbach

Ribwar Ahmadi

Mehrdad Mokhtari

Curtis P. Berlinguette