TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la prochaine cohorte qui débutera le 15 avril.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
LLM2Vec-Gen: Generative Embeddings from Large Language Models
LLM-based text embedders typically encode the semantic content of their input. However, embedding tasks require mapping diverse inputs to si… (voir plus)milar outputs. Typically, this input-output is addressed by training embedding models with paired data using contrastive learning. In this work, we propose a novel self-supervised approach, LLM2Vec-Gen, which adopts a different paradigm: rather than encoding the input, we learn to represent the model's potential response. Specifically, we add trainable special tokens to the LLM's vocabulary, append them to input, and optimize them to represent the LLM's response in a fixed-length sequence. Training is guided by the LLM's own completion for the query, along with an unsupervised embedding teacher that provides distillation targets. This formulation helps to bridge the input-output gap and transfers LLM capabilities such as safety alignment and reasoning to embedding tasks. Crucially, the LLM backbone remains frozen and training requires only unlabeled queries. LLM2Vec-Gen achieves state-of-the-art self-supervised performance on the Massive Text Embedding Benchmark (MTEB), improving by 9.3% over the best unsupervised embedding teacher. We also observe up to 43.2% reduction in harmful content retrieval and 29.3% improvement in reasoning capabilities for embedding tasks. Finally, the learned embeddings are interpretable and can be decoded into text to reveal their semantic content.
On evolutionary timescales, brain circuits adapt to support survival in each species' ecological niche. While some anatomical aspects of neu… (voir plus)ral circuitry are conserved across species with distant evolutionary origins, each species also exhibits specific circuit adaptations that enable its behavioral repertoire. It remains unclear whether homologous brain regions leverage analogous neural computations as different species perform common behaviors such as reaching and manipulating objects. Here, we directly assessed conservation of neural computations using intracortical recordings from mouse, monkey, and human motor cortex-a homologous region across many mammals-during motor behaviors crucial for survival. We hypothesized that, despite their phylogenetic distance, rodents and primates produce movements through conserved neural computations implemented by motor cortical population dynamics. Remarkably, we found that movement-related neural dynamics were highly conserved across species, while variations in behavioral output were uniquely captured in neural trajectory geometries. Strikingly, neural dynamics during movement across species were more conserved than those across brain regions in the same human and between motor preparation and execution in the same monkeys. Lastly, through manipulation of neural network models trained to perform reaching movements, we reinforce that conservation of neural dynamics across species likely stems from shared circuit constraints. We thus assert that evolution maintains neural computations across phylogeny even as behavioral repertoires expand.
Soft mellowmax (SMM) recently emerged as an alternative operator in Q-learning, achieving impressive performance in games and scientific dis… (voir plus)covery tasks. Despite SMM's ability to achieve high returns and its enticing robustness, diversity, and sample efficiency characteristics, SMM has not yet been translated into a Monte Carlo tree search algorithm. To address this gap, a soft mellowmax-based Monte Carlo tree search algorithm, SMM-TS, is proposed and theoretically justified. It is empirically demonstrated that SMM-TS converges significantly faster than other tree search methods in synthetic environments, while maintaining competitive performance in games. The fast convergence of SMM-TS makes recursive self-improvement loops more scalable, while the stability gained via planning and the robustness of the operator make SMM-TS more practical for agents operating in uncertain and changing environments.
2026-03-04
RSI @ International Conference on Learning Representations (poster)
Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent a… (voir plus)ctivation-based jailbreaking methods circumvent these safety mechanisms by applying orthogonal projections to remove refusal directions, but these approaches treat refusal as a one-dimensional phenomenon and ignore the rich distributional structure of model activations. We introduce a principled framework based on optimal transport theory that transforms the entire distribution of harmful activations to match harmless ones. By combining PCA with closed-form Gaussian optimal transport, we achieve efficient computation in high-dimensional representation spaces while preserving essential geometric structure. Across six models (Llama-2, Llama-3.1, Qwen-2.5; 7B-32B parameters), our method achieves up to 11% higher attack success rates than state-of-the-art baselines while maintaining comparable perplexity, demonstrating superior preservation of model capabilities. Critically, we discover that layer-selective intervention (applying optimal transport to 1-2 carefully chosen layers at approximately 40-60% network depth) substantially outperforms full-network interventions, revealing that refusal mechanisms may be localized rather than distributed. Our analysis provides new insights into the geometric structure of safety representations and suggests that current alignment methods may be vulnerable to distributional attacks beyond simple direction removal.
Accurate prediction of ionic conductivity is critical for the design of highperformance solid-state electrolytes in next-generation batterie… (voir plus)s. We benchmark molecular dynamics (MD) approaches for computing ionic conductivity in 21 lithium solid electrolytes for which experimental ionic conductivity has been previously reported in the literature. Specifically, we compare simulations driven by density functional theory (DFT) and by universal machine-learning interatomic potentials (uMLIPs), namely a MACE foundation model. Our results suggest comparable performance between DFT and MACE, with MACE requiring only a fraction of the computational cost. The framework developed here is designed to enable systematic comparisons with additional uMLIPs and fine-tuned models in future work.
2026-03-01
AI4Mat @ International Conference on Learning Representations (poster)
Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) generation, yet their reliance on Transforme… (voir plus)r backbones limits inference efficiency due to quadratic attention or KV-cache overhead. We introduce DiffuMamba, a masked diffusion language model built on a bidirectional Mamba backbone that combines the diffusion objective with linear-time sequence modeling, and DiffuMamba-H, a hybrid variant with interleaved attention. Across scales up to 1.3B parameters, our models match Transformer-based diffusion in downstream performance while achieving up to 8.2× and 4.3× higher inference throughput, respectively, on long sequences. We further present a systematic analysis of inference efficiency across modern DLM variants, combining asymptotic complexity with empirical measurements. Notably, cache-efficient block diffusion with Mamba mixers emerges as the only strategy that scales linearly with sequence length and achieves the strongest performance across all baselines, suggesting a promising direction for future diffusion-based generation systems.
2026-03-01
MM_Intelligence @ International Conference on Learning Representations (poster)
Information retrieval is a core component of many intelligent systems as it enables conditioning of outputs on new and large-scale datasets.… (voir plus) While effective, the standard practice of encoding data into high-dimensional representations for similarity search entails large memory and compute footprints, and also makes it hard to inspect the inner workings of the system. Hierarchical retrieval methods offer an interpretable alternative by organizing data at multiple granular levels, yet do not match the efficiency and performance of flat retrieval approaches. In this paper, we propose ReTreever, a tree-based method that makes hierarchical retrieval viable at scale by directly optimizing its structure for retrieval performance while naturally providing transparency through meaningful semantic groupings.
Our method offers the flexibility to balance cost and utility by indexing data using representations from any tree level. We show that ReTreever delivers strong coarse (intermediate levels) and fine representations (terminal level), while achieving the highest retrieval accuracy at the lowest latency among hierarchical methods. These results demonstrate that this family of techniques is viable in practical applications.
2026-03-01
Trustworthy AI @ International Conference on Learning Representations (publié)
Sensory organization at the spinal segment level is commonly inferred from dermatomal maps that assume a fixed correspondence between cutane… (voir plus)ous regions and spinal segments. However, based on the complexities of spinal neuroanatomy and neurophysiology, the distribution of sensory signals within the cord may be broader and less segment-specific than dermatomal maps suggest, leaving the segment-level localization of sensory-evoked activity in humans uncertain. Spinal cord functional magnetic resonance imaging (fMRI) is currently the only technique capable of noninvasively mapping sensory activity with high spatial resolution in the human spinal cord. However, its application remains technically challenging and is limited by the uncertainty in segmental localization. In this study, we leveraged recent advancements in spinal cord fMRI, including spinal nerve rootlet-based spatial normalization, to investigate how sensory information is represented and distributed within the human spinal cord during electrocutaneous stimulation of the third digit of the right hand (i.e., C7 dermatome). Forty healthy adults were scanned with electrocutaneous stimulation at four individualized intensities across multiple runs to quantify (i) the rostrocaudal distribution of sensory-evoked activity, (ii) intensity-dependent changes in detectability and localization, and (iii) the effect of normalization strategy on segmental localization. Across participants, stimulation produced activation localized in the lower cervical cord (e.g., C6-C8), with the most consistent segmental localization near C7. Stronger stimulation increased detectability and produced more consistent segmental localization across participants. Importantly, normalization that incorporated nerve rootlet landmarks sharpened localization and improved sensitivity relative to conventional intervertebral disc-based alignment. This highlights the value of functionally relevant anatomical landmarks for group inference in the spinal cord. Responses were strongest in the initial run and attenuated with repetition, suggesting habituation or adaptation that can bias multi-run paradigms if unmodeled. Together, our results define practical acquisition and analysis conditions (e.g., stimulation strength, anatomical alignment strategy, and run structure) under which segment-level spinal sensory responses can be detected, thereby supporting more reliable studies of human spinal cord future basic and translational studies, including pain mechanisms, sensory function, and spinal injury.
Optimization in deep learning has expanded beyond Euclidean methods to include entrywise sign updates (SignSGD) and spectral sign updates (S… (voir plus)pecGD/Muon). While both can be viewed as steepest descent under non-Euclidean geometries (
2026-03-01
GRaM @ International Conference on Learning Representations (poster)
Objective This study evaluates multiple machine learning approaches to predict metabolic syndrome (MetS) risk in the Quebec, Canada populati… (voir plus)on. We further perform explainability analysis to interpret model predictions and identify key features driving risk classification. Methods and analysis This study followed the Minimum Information about Clinical Artificial Intelligence Modeling (MI-CLAIM) guideline for reporting. We used cross-sectional data from the Canadian Community Health Survey (2015–2018) for the population living in the province of Quebec, which includes 42,279 participants. Partial sampling was used to obtain a balanced dataset for model development. We evaluated seven machine learning models for the defined classification task, including Logistic Regression, XGBoost, LightGBM, TabNet, NODE, 1D-CNN and Regularisation Cocktails. Performance was assessed using accuracy, precision, recall, F1-score, AUROC, and AUPRC, and interpretability was examined using SHAP to identify key predictors of MetS risk. Results After partial sampling, 7,866 participants (4,856 high-risk and 3,010 low-risk MetS cases) were included in the machine learning analysis. XGBoost and NODE showed the strongest performance. XGBoost achieved the highest accuracy (80.4%) and AUROC (84.1%), while NODE achieved the highest precision (80.1%) and AUPRC (86.0%). Explainability analysis identified age, perceived health, and sex as the most important features contributing to MetS risk predictions. Conclusion This study shows that machine learning can accurately predict MetS risk using self-reported health survey data from the Quebec population. Comparison of classical and deep learning approaches identified the optimal predictive model, and explainability analyses identified the most important features contributing to the risk predictions, which align with established clinical evidence. These results support a machine learning–driven initial screening framework for population-level early identification of high-risk individuals, enabling targeted interventions and efficient allocation of healthcare resources.
Robotic manipulators hold significant untapped potential for manufacturing industries, particularly when deployed in multi-robot configurati… (voir plus)ons that can enhance resource utilization, increase throughput, and reduce costs. However, industrial manipulators typically operate in isolated one-robot, one-machine setups, limiting both utilization and scalability. Even mobile robot implementations generally rely on centralized architectures, creating vulnerability to single points of failure and requiring robust communication infrastructure. This paper introduces SMAPPO (Scalable Multi-Agent Proximal Policy Optimization), a scalable input-size invariant multi-agent reinforcement learning model for decentralized multi-robot management in industrial environments. MAPPO (Multi-Agent Proximal Policy Optimization) represents the current state-of-the-art approach. We optimized an existing simulator to handle complex multi-agent reinforcement learning scenarios and designed a new multi-machine tending scenario for evaluation. Our novel observation encoder enables SMAPPO to handle varying numbers of agents, machines, and storage areas with minimal or no retraining. Results demonstrate SMAPPO's superior performance compared to the state-of-the-art MAPPO across multiple conditions: full retraining (up to 61% improvement), curriculum learning (up to 45% increased productivity and up to 49% fewer collisions), zero-shot generalization to significantly different scale scenarios (up to 272% better performance without retraining), and adaptability under extremely low initial training (up to 100% increase in parts delivery).