Guy Wolf

Biography

Guy Wolf is an associate professor in the Department of Mathematics and Statistics at Université de Montréal.

His research interests lie at the intersection of machine learning, data science and applied mathematics. He is particularly interested in data mining methods that use manifold learning and deep geometric learning, as well as applications for the exploratory analysis of biomedical data.

Wolf’s research focuses on exploratory data analysis and its applications in bioinformatics. His approaches are multidisciplinary and bring together machine learning, signal processing and applied math tools. His recent work has used a combination of diffusion geometries and deep learning to find emergent patterns, dynamics, and structure in big high dimensional- data (e.g., in single-cell genomics and proteomics).

Current Students

Ria Arora

Master's Research - Université de Montréal

Co-supervisor :

Liam Paull

Adrien Aumon

PhD - Université de Montréal

Semih Cantürk

PhD - Université de Montréal

Joao Felipe Carneiro Barbosa Rocha

Collaborating researcher - Yale University

Co-supervisor :

Collaborating Alumni

Collaborating Alumni - Université de Montréal

Stefan Horoi

PhD - Université de Montréal

Will Hua

Collaborating Alumni - McGill University

Xiao Huang

Master's Research - Concordia University

Principal supervisor :

PhD - Université de Montréal

Paul Janson

PhD - Concordia University

Principal supervisor :

Research Intern - Université de Montréal

Vincent Létourneau

Collaborating Alumni - Université de Montréal

Myriam Lizotte

PhD - Université de Montréal

Philippe Martin

PhD - Université de Montréal

Co-supervisor :

Paul François

Paria Mehrbod

Master's Research - Concordia University

Principal supervisor :

Eugene Belilovsky

Lydia Mezrag

PhD - Université de Montréal

Kevin Moon

Independent visiting researcher

Sacha Morin

PhD - Université de Montréal

Co-supervisor :

Postdoctorate - Concordia University

Principal supervisor :

Shuang Ni

PhD - Université de Montréal

Albert Orozco Camacho

PhD - Concordia University

Principal supervisor :

Master's Research - Université de Montréal

Matthew Scicluna

PhD - Université de Montréal

Principal supervisor :

Master's Research - Université de Montréal

Pedro Vianna

Collaborating researcher - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Postdoctorate - Université de Montréal

Stephanie Zandee

Collaborating researcher - McGill University (assistant professor)

Exploring the COVID-19 Interferon Paradox with Dimensionality Reduction and Clustering

Blog Posts

Graph and representation of working methodology, and graph of data on deaths 60 days after onset of symptoms.

February 19, 2025

Sacha Morin

Elsa Brunet-Ratnasingham

Guy Wolf

Read the article

Publications

Random Forest Autoencoders for Guided Representation Learning

Kevin R. Moon

Jake S. Rhodes

Decades of research have produced robust methods for unsupervised data visualization, yet supervised visualization…

2025-10-22

logconference.io/LOG/2025/Conference (poster)

Low-dimensional embeddings of high-dimensional data

Cyril de Bodt

Alex Diaz-Papkovich

Michael Bleher

Kerstin Bunte

Corinna Coupette

Sebastian Damrich

Fred A. Hamprecht

EmHoke-'Agnes Horv'at

Dhruv Kohli

John A. Lee 0001

Boudewijn P. F. Lelieveldt

Leland McInnes

Ian T. Nabney

Maximilian Noichl

Pavlin G. Polivcar

Bastian Rieck

Gal Mishne … (see 1 more)

Dmitry Kobak

Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (see more)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.

2025-08-21

ArXiv (preprint)

arxiv.org

Low-dimensional embeddings of high-dimensional data

Cyril de Bodt

Alex Diaz-Papkovich

Michael Bleher

Kerstin Bunte

Corinna Coupette

Sebastian Damrich

Fred Hamprecht

EmHoke-'Agnes Horv'at

Dhruv Kohli

John A. Lee 0001

Boudewijn P. F. Lelieveldt

Leland McInnes

Ian T. Nabney

Maximilian Noichl

Pavlin G. Polivcar

Bastian Rieck

Gal Mishne … (see 1 more)

Dmitry Kobak

2025-08-21

ArXiv (preprint)

arxiv.org

Low-dimensional embeddings of high-dimensional data

Cyril de Bodt

Alex Diaz-Papkovich

Michael Bleher

Kerstin Bunte

Corinna Coupette

Sebastian Damrich

Fred Hamprecht

Emőke-Ágnes Horvát

Dhruv Kohli

John A. Lee

Boudewijn P. F. Lelieveldt

Leland McInnes

Ian T. Nabney

Maximilian Noichl

Pavlin G. Poličar

Bastian Rieck

Gal Mishne … (see 1 more)

Dmitry Kobak

2025-08-01

arXiv (published)

Towards a General Recipe for Combinatorial Optimization with Multi-Filter GNNs

Michael Perlmutter

2025-07-30

Proceedings of the Third Learning on Graphs Conference (published)

proceedings.mlr.press

Circuit Discovery Helps To Detect LLM Jailbreaking

Despite extensive safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safeguards to elicit har… (see more)mful content. While prior work attributes this vulnerability to safety training limitations, the internal mechanisms by which LLMs process adversarial prompts remain poorly understood. We present a mechanistic analysis of the jailbreaking behavior in a large-scale, safety-aligned LLM, focusing on LLaMA-2-7B-chat-hf. Leveraging edge attribution patching and subnetwork probing, we systematically identify computational circuits responsible for generating affirmative responses to jailbreak prompts. Ablating these circuits during the first token prediction can reduce attack success rates by up to 80\%, demonstrating its critical role in safety bypass. Our analysis uncovers key attention heads and MLP pathways that mediate adversarial prompt exploitation, revealing how important tokens propagate through these components to override safety constraints. These findings advance the understanding of adversarial vulnerabilities in aligned LLMs and pave the way for targeted, interpretable defenses mechanisms based on mechanistic interpretability.

2025-06-30

ICML.cc/2025/Workshop/R2-FM (poster)

Test Time Adaptation Using Adaptive Quantile Recalibration

2025-06-10

ICML.cc/2025/Workshop/PUT (poster)

RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis

Robin Yadav

Qi Yan

Joey Bose

Renjie Liao

A fundamental problem in organic chemistry is identifying and predicting the series of reactions that synthesize a desired target product mo… (see more)lecule. Due to the combinatorial nature of the chemical search space, single-step reactant prediction -- i.e. single-step retrosynthesis -- remains challenging even for existing state-of-the-art template-free generative approaches to produce an accurate yet diverse set of feasible reactions. In this paper, we model single-step retrosynthesis planning and introduce RETRO SYNFLOW (RSF) a discrete flow-matching framework that builds a Markov bridge between the prescribed target product molecule and the reactant molecule. In contrast to past approaches, RSF employs a reaction center identification step to produce intermediate structures known as synthons as a more informative source distribution for the discrete flow. To further enhance diversity and feasibility of generated samples, we employ Feynman-Kac steering with Sequential Monte Carlo based resampling to steer promising generations at inference using a new reward oracle that relies on a forward-synthesis model. Empirically, we demonstrate \nameshort achieves

2025-06-04

ArXiv (preprint)

arxiv.org

Geometry aware graph attention networks to explain single-cell chromatin state and gene expression

Gabriele Malagoli

Patrick Hanel

A. Danese

Maria Colomé-Tatché

2025-06-01

bioRxiv (preprint)

Neurospectrum: A Geometric and Topological Deep Learning Framework for Uncovering Spatiotemporal Signatures in Neural Activity

Dhananjay Bhaskar

Yanlei Zhang

Jessica Moore

Feng Gao

Bastian Rieck

Firas Khasawneh

Elizabeth Munch

J. Adam Noah

Helen Pushkarskaya

Christopher Pittenger

Valentina Greco

2025-05-08

bioRxiv (preprint)

Graph Neural Networks Meet Probabilistic Graphical Models: A Survey

Qian Zhang

2025-04-06

IEEE International Conference on Acoustics, Speech, and Signal Processing (published)

Unsupervised Test-Time Adaptation for Hepatic Steatosis Grading Using Ultrasound B-Mode Images.

Michael Eickenberg

An Tang

Guy Cloutier

Ultrasound is considered a key modality for the clinical assessment of hepatic steatosis (i.e., fatty liver) due to its non-invasiveness and… (see more) availability. Deep learning methods have attracted considerable interest in this field, as they are capable of learning patterns in a collection of images and achieve clinically comparable levels of accuracy in steatosis grading. However, variations in patient populations, acquisition protocols, equipment, and operator expertise across clinical sites can introduce domain shifts that reduce model performance when applied outside the original training setting. In response, unsupervised domain adaptation techniques are being investigated to address these shifts, allowing models to generalize more effectively across diverse clinical environments. In this work, we propose a test-time batch normalization technique designed to handle domain shift, especially for changes in label distribution, by adapting selected features of batch normalization layers in a trained convolutional neural network model. This approach operates in an unsupervised manner, allowing robust adaptation to new distributions without access to label data. The method was evaluated on two abdominal ultrasound datasets collected at different institutions, assessing its capability in mitigating domain shift for hepatic steatosis classification. The proposed method reduced the mean absolute error in steatosis grading by 37% and improved the area under the receiver operating characteristic curve for steatosis detection from 0.78 to 0.97, compared to non-adapted models. These findings demonstrate the potential of the proposed method to address domain shift in ultrasound-based hepatic steatosis diagnosis, minimizing risks associated with deploying trained models in various clinical settings.

2025-03-26

IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (published)