Publications

Self-Refining Training for Amortized Density Functional Theory

Cristian Gabellini

Hatem Helal

Kirill Neklyudov

Density Functional Theory (DFT) allows for predicting all the chemical and physical properties of molecular systems from first principles by… (see more) finding an approximate solution to the many-body Schrödinger equation. However, the cost of these predictions becomes infeasible when increasing the scale of the energy evaluations, e.g., when calculating the ground-state energy for simulating molecular dynamics. Recent works have demonstrated that, for substantially large datasets of molecular conformations, Deep Learning-based models can predict the outputs of the classical DFT solvers by amortizing the corresponding optimization problems. In this paper, we propose a novel method that reduces the dependency of amortized DFT solvers on large pre-collected datasets by introducing a self-refining training strategy. Namely, we propose an efficient method that simultaneously trains a deep-learning model to predict the DFT outputs and samples molecular conformations that are used as training data for the model. We derive our method as a minimization of the variational upper bound on the KL-divergence measuring the discrepancy between the generated samples and the target Boltzmann distribution defined by the ground state energy. To demonstrate the utility of the proposed scheme, we perform an extensive empirical study comparing it with the models trained on the pre-collected datasets. Finally, we open-source our implementation of the proposed algorithm, optimized with asynchronous training and sampling stages, which enables simultaneous sampling and training. Code is available at https://github.com/majhas/self-refining-dft.

2025-06-01

ArXiv (preprint)

doi.org

arxiv.org

Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization

Wojciech Masarczyk

Mateusz Ostaszewski

Tin Sum Cheng

Tomasz Trzci'nski

Aurélien Lucchi

Razvan Pascanu

The softmax function is a fundamental building block of deep neural networks, commonly used to define output distributions in classification… (see more) tasks or attention weights in transformer architectures. Despite its widespread use and proven effectiveness, its influence on learning dynamics and learned representations remains poorly understood, limiting our ability to optimize model behavior. In this paper, we study the pivotal role of the softmax function in shaping the model's representation. We introduce the concept of rank deficit bias - a phenomenon in which softmax-based deep networks find solutions of rank much lower than the number of classes. This bias depends on the softmax function's logits norm, which is implicitly influenced by hyperparameters or directly modified by softmax temperature. Furthermore, we demonstrate how to exploit the softmax dynamics to learn compressed representations or to enhance their performance on out-of-distribution data. We validate our findings across diverse architectures and real-world datasets, highlighting the broad applicability of temperature tuning in improving model performance. Our work provides new insights into the mechanisms of softmax, enabling better control over representation learning in deep neural networks.

2025-06-01

ArXiv (preprint)

doi.org

arxiv.org

Advancing global antifungal development to combat invasive fungal infection

Xiu-Li Wang

Jun Ding

Koon Ho Wong

Chen Ding

Chang-Bin Chen

Wen-Juan Wu

Ningning Liu

2025-05-31

hLife (published)

doi.org

A flexible machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition

Marc-André Legault

Jason Hartford

Benoit J. Arsenault

Archer Y. Yang

Joelle Pineau

2025-05-31

American Journal of Human Genetics (published)

doi.org

Geometry aware graph attention networks to explain single-cell chromatin state and gene expression

Gabriele Malagoli

Patrick Hanel

Anna Danese

Guy Wolf

Maria Colomé-Tatché

High-throughput measurements that profile the transcriptome or the epigenome of single-cells are becoming a common way to study cell identit… (see more)y. These data are high dimensional, sparse and non linear. Here we present SEAGALL (Single-cell Explainable Geometry-Aware Graph Attention Learning pipeLine), a hypothesis free method to extract biologically relevant features from single-cell experiments based on geometry regularised autoencoders (GRAE) and explainable graph attention networks (GAT). We use a GRAE to embed the data into a latent space preserving the data geometry and we construct a cell-to-cell graph computing distances in the GRAE bottleneck. Exploiting the attention mechanism to dynamically learn the relevant edges, we use GATs to classify the cells and we explain the predictions of the model with XAI methods to unravel the features which are driving cell identity beyond marker genes. We apply our method to data sets from scRNA-seq, scATAC-seq and scChIP-seq experiments. SEAGALL can extract cell type specific and stable signatures which not only differ from the ones found in classical linear approaches but are less biassed by coverage and high expression.

2025-05-31

bioRxiv (preprint)

doi.org

GNN-based Decentralized Perception in Multirobot Systems for Predicting Worker Actions

Ali Imran

Giovanni Beltrame

David St-Onge

In industrial environments, predicting human actions is essential for ensuring safe and effective collaboration between humans and robots. T… (see more)his paper introduces a perception framework that enables mobile robots to understand and share information about human actions in a decentralized way. The framework first allows each robot to build a spatial graph representing its surroundings, which it then shares with other robots. This shared spatial data is combined with temporal information to track human behavior over time. A swarm-inspired decision-making process is used to ensure all robots agree on a unified interpretation of the human's actions. Results show that adding more robots and incorporating longer time sequences improve prediction accuracy. Additionally, the consensus mechanism increases system resilience, making the multi-robot setup more reliable in dynamic industrial settings.

2025-05-31

IEEE Robotics and Automation Letters (published)

doi.org

arxiv.org

Impact de l'antibiothérapie par Daptomycine dans le traitement des bactériémies à Enterococcus faecium en réanimation : l'étude rétrospective multicentrique ENTERODAPTO.

S. Herbel

Guillaume Dumas

L. Chantelot

J. Massol

Q. Moyon

J. Ricard

E. Azoulay

C. Hauw-Berlemont

E. Maury

T. Urbina

2025-05-31

Médecine et Maladies Infectieuses Formation (published)

doi.org

STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent

Yewon Lee

Philip Huang

Yizhou Huang

Krishna Murthy

Andrew Zou Li

Fabian Damken

Eric Heiden

Kevin A. Smith

D. Nowrouzezahrai

Fabio Ramos

Florian Shkurti

Carnegie-mellon University

M. I. O. Technology

Technische Universitat Darmstadt

Nvidia

M. University

University of Sydney

Planning for many manipulation tasks, such as using tools or assembling parts, often requires both symbolic and geometric reasoning. Task an… (see more)d Motion Planning (TAMP) algorithms typically solve these problems by conducting a tree search over high-level task sequences while checking for kinematic and dynamic feasibility. While performant, most existing algorithms are highly inefficient as their time complexity grows exponentially with the number of possible actions and objects. Additionally, they only find a single solution to problems in which many feasible plans may exist. To address these limitations, we propose a novel algorithm called Stein Task and Motion Planning (STAMP) that leverages parallelization and differentiable simulation to efficiently search for multiple diverse plans. STAMP relaxes discrete-and-continuous TAMP problems into continuous optimization problems that can be solved using variational inference. Our algorithm builds upon Stein Variational Gradient Descent, a gradient-based variational inference algorithm, and parallelized differentiable physics simulators on the GPU to efficiently obtain gradients for inference. Further, we employ imitation learning to introduce action abstractions that reduce the inference problem to lower dimensions. We demonstrate our method on two TAMP problems and empirically show that STAMP is able to: 1) produce multiple diverse plans in parallel; and 2) search for plans more efficiently compared to existing TAMP baselines.

2025-05-31

IEEE Robotics and Automation Letters (published)

doi.org

openreview.net

A systematic review of hyperscanning in clinical encounters

Lena Adel

Lisane Moses

Elisabeth Irvine

Kyle T Greenway

Guillaume Dumas

Michael Lifshitz

2025-05-31

Neuroscience and Biobehavioral Reviews (published)

doi.org

Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants

Bill Qi

Yannis Trakadis

2025-05-30

Bioengineering (published)

doi.org

Continual Learning in Vision-Language Models via Aligned Model Merging

Ghada Sokar

Gintare Karolina Dziugaite

Anurag Arnab

Ahmet Iscen

Pablo Samuel Castro

Cordelia Schmid

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors pl… (see more)asticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.

2025-05-29

ArXiv (preprint)

doi.org

arxiv.org

Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes

Ge Ya Luo

D. Nowrouzezahrai

Alexia Jolicoeur-Martineau

Christopher Pal

Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes … (see more)due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.

2025-05-29

ArXiv (preprint)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications