TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la prochaine cohorte qui débutera le 15 avril.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Scaling Deep Learning Solutions for Transition Path Sampling
Transition path sampling (TPS) is an important method for studying rare events, such as they happen in chemical reactions or protein folding… (voir plus). These events occur so infrequently that traditional simulations are often impractical, and even recent machine-learning approaches struggle to address this issue for larger systems. In this paper, we propose using modern deep learning techniques to improve the scalability of TPS methods significantly. We highlight the need for better evaluations in the existing literature and start by formulating TPS as a sampling problem over an unnormalized target density and introduce relevant evaluation metrics to assess the effectiveness of TPS solutions from this perspective. To develop a scalable approach, we explore several design choices, including a problem-informed neural network architecture, simulated annealing, the integration of prior knowledge into the sampling process, and attention mechanisms. Finally, we conduct a comprehensive empirical study and compare these design choices with other recently developed deep-learning methods for rare event sampling.
Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values… (voir plus) - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.
2025-03-04
Bi-Align @ International Conference on Learning Representations (poster)
Understanding (Un)Reliability of Steering Vectors in Language Models
Joschka Braun
Carsten Eickhoff
David M. Krueger
Seyed Ali Bahrainian
Dmitrii Krasheninnikov
Steering vectors are a lightweight method to control language model behavior by adding a learned bias to the activations at inference time. … (voir plus)Although steering demonstrates promising performance, recent work shows that it can be unreliable or even counterproductive in some cases. This paper studies the influence of prompt types and the geometry of activation differences on steering reliability. First, we find that all seven prompt types used in our experiments produce a net positive steering effect, but exhibit high variance across samples, and often give an effect opposite of the desired one. No prompt type clearly outperforms the others, and yet the steering vectors resulting from the different prompt types often differ directionally (as measured by cosine similarity). Second, we show that higher cosine similarity between training set activation differences predicts more effective steering. Finally, we observe that datasets where positive and negative activations are better separated are more steerable. Our results suggest that vector steering is unreliable when the target behavior is not represented by a coherent direction.
As multilingual generative models become more widely used, most safety and fairness evaluation techniques still focus on English-language re… (voir plus)sources, while overlooking important cross-cultural factors. This limitation raises concerns about fairness and safety, particularly regarding geoculturally situated stereotypes that hinder the models’ global inclusivity. In this work, we present preliminary findings on the impact of stereotype unlearning across languages, specifically in English, French, and Hindi. Using an adapted version of the SeeGULL dataset, we analyze how unlearning stereotypes in one language influences other languages within multilingual large language models. Our study evaluates two model families, Llama-3.1-8B and Aya-Expanse-8B, to assess whether unlearning in one linguistic context transfers across languages, potentially mitigating or exacerbating biases in multilingual settings.
Image generation abilities of text-to-image diffusion models have significantly
advanced, yielding highly photo-realistic images from descri… (voir plus)ptive text and
increasing the viability of leveraging synthetic images to train computer vision
models. To serve as effective training data, generated images must be highly
realistic while also sufficiently diverse within the support of the target data
distribution. Yet, state-of-the-art conditional image generation models have been
primarily optimized for creative applications, prioritizing image realism and
prompt adherence over conditional diversity. In this paper, we investigate how
to improve the diversity of generated images with the goal of increasing their
effectiveness to train downstream image classification models, without fine-tuning
the image generation model. We find that conditioning the generation process
on an augmented real image and text prompt produces generations that serve as
effective synthetic datasets for downstream training. Conditioning on real training
images contextualizes the generation process to produce images that are in-domain
with the real image distribution, while data augmentations introduce visual
diversity that improves the performance of the downstream classifier. We validate
augmentation-conditioning on a total of five established long-tail and few-shot im-
age classification benchmarks and show that leveraging augmentations to condition
the generation process results in consistent improvements over the state-of-the-art
on the long-tailed benchmark and remarkable gains in extreme few-shot regimes of
the remaining four benchmarks. These results constitute an important step towards
effectively leveraging synthetic data for downstream training.
Considerations and recommendations from the ISMRM Diffusion Study Group for preclinical diffusion MRI: Part 2 - Ex vivo imaging: added value and acquisition
The value of preclinical diffusion MRI (dMRI) is substantial. While dMRI enables in vivo non-invasive characterization of tissue, ex vivo dM… (voir plus)RI is increasingly used to probe tissue microstructure and brain connectivity. Ex vivo dMRI has several experimental advantages including higher signal-to-noise ratio and spatial resolution compared to in vivo studies, and enabling more advanced diffusion contrasts. Another major advantage of ex vivo dMRI is the direct comparison with histological data as a methodological validation. However, there are a number of considerations that must be made when performing ex vivo experiments. The steps from tissue preparation, image acquisition and processing, and interpretation of results are complex, with decisions that not only differ dramatically from in vivo imaging of small animals, but ultimately affect what questions can be answered using the data. This work represents "Part 2" of a 3-part series of recommendations and considerations for preclinical dMRI. We describe best practices for dMRI of ex vivo tissue, with a focus on the value that ex vivo imaging adds to the field of dMRI and considerations in ex vivo image acquisition. We give general considerations and foundational knowledge that must be considered when designing experiments. We describe differences in specimens and models and discuss why some may be more or less appropriate for different studies. We then give guidelines for ex vivo protocols, including tissue fixation, sample preparation, and MR scanning. In each section, we attempt to provide guidelines and recommendations, but also highlight areas for which no guidelines exist (and why), and where future work should lie. An overarching goal herein is to enhance the rigor and reproducibility of ex vivo dMRI acquisitions and analyses, and thereby advance biomedical knowledge.
This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhanc… (voir plus)e deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.
2025-03-03
2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) (publié)
*In silico* design and optimization of new materials primarily relies on high-accuracy atomic simulators that perform density functional the… (voir plus)ory (DFT) calculations. While recent works showcase the strong potential of machine learning to accelerate the material design process, they mostly consist of generative approaches that do not use direct DFT signals as feedback to improve training and generation mainly due to DFT's high computational cost. To aid the adoption of direct DFT signals in the materials design loop through online reinforcement learning (RL), we propose **CrystalGym**, an open-source RL environment for crystalline material discovery. Using CrystalGym, we benchmark value- and policy-based reinforcement learning algorithms for designing various crystals conditioned on target properties. Concretely, we optimize for challenging properties like the band gap, bulk modulus, and density, which are directly calculated from DFT in the environment. While none of the algorithms we benchmark solve all CrystalGym tasks, our extensive experiments and ablations show different sample efficiencies and ease of convergence to optimality for different algorithms and environment settings. Our goal is for CrystalGym to serve as a test bed for reinforcement learning researchers and material scientists to address these real-world design problems with practical applications. Furthermore, we introduce a novel class of challenges for reinforcement learning methods dealing with time-consuming reward signals, paving the way for future interdisciplinary research for machine learning motivated by real-world applications.
Development and Feasibility Study of HOPE Model for Prediction of Depression Among Older Adults Using Wi-Fi-based Motion Sensor Data: Machine Learning Study
Depression, characterized by persistent sadness and loss of interest in daily activities, greatly reduces quality of life. Early detection i… (voir plus)s vital for effective treatment and intervention. While many studies use wearable devices to classify depression based on physical activity, these often rely on intrusive methods. Additionally, most depression classification studies involve large participant groups and use single-stage classifiers without explainability.
This study aims to assess the feasibility of classifying depression using nonintrusive Wi-Fi–based motion sensor data using a novel machine learning model on a limited number of participants. We also conduct an explainability analysis to interpret the model’s predictions and identify key features associated with depression classification.
In this study, we recruited adults aged 65 years and older through web-based and in-person methods, supported by a McGill University health care facility directory. Participants provided consent, and we collected 6 months of activity and sleep data via nonintrusive Wi-Fi–based sensors, along with Edmonton Frailty Scale and Geriatric Depression Scale data. For depression classification, we proposed a HOPE (Home-Based Older Adults’ Depression Prediction) machine learning model with feature selection, dimensionality reduction, and classification stages, evaluating various model combinations using accuracy, sensitivity, precision, and F1-score. Shapely addictive explanations and local interpretable model-agnostic explanations were used to explain the model’s predictions.
A total of 6 participants were enrolled in this study; however, 2 participants withdrew later due to internet connectivity issues. Among the 4 remaining participants, 3 participants were classified as not having depression, while 1 participant was identified as having depression. The most accurate classification model, which combined sequential forward selection for feature selection, principal component analysis for dimensionality reduction, and a decision tree for classification, achieved an accuracy of 87.5%, sensitivity of 90%, and precision of 88.3%, effectively distinguishing individuals with and those without depression. The explainability analysis revealed that the most influential features in depression classification, in order of importance, were “average sleep duration,” “total number of sleep interruptions,” “percentage of nights with sleep interruptions,” “average duration of sleep interruptions,” and “Edmonton Frailty Scale.”
The findings from this preliminary study demonstrate the feasibility of using Wi-Fi–based motion sensors for depression classification and highlight the effectiveness of our proposed HOPE machine learning model, even with a small sample size. These results suggest the potential for further research with a larger cohort for more comprehensive validation. Additionally, the nonintrusive data collection method and model architecture proposed in this study offer promising applications in remote health monitoring, particularly for older adults who may face challenges in using wearable devices. Furthermore, the importance of sleep patterns identified in our explainability analysis aligns with findings from previous research, emphasizing the need for more in-depth studies on the role of sleep in mental health, as suggested in the explainable machine learning study.