Guy Wolf

Biographie

Guy Wolf est professeur agrégé au Département de mathématiques et de statistique de l'Université de Montréal. Ses intérêts de recherche se situent au carrefour de l'apprentissage automatique, de la science des données et des mathématiques appliquées. Il s'intéresse particulièrement aux méthodes d'exploration de données qui utilisent l'apprentissage multiple et l'apprentissage géométrique profond, ainsi qu'aux applications pour l'analyse exploratoire des données biomédicales.

Ses recherches portent sur l'analyse exploratoire des données, avec des applications en bio-informatique. Ses approches sont multidisciplinaires et combinent l'apprentissage automatique, le traitement du signal et les outils mathématiques appliqués. En particulier, ses travaux récents utilisent une combinaison de géométries de diffusion et d'apprentissage profond pour trouver des modèles émergents, des dynamiques et des structures dans les mégadonnées à grande dimension (par exemple, dans la génomique et la protéomique de la cellule unique).

Étudiants actuels

Ria Arora

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

semihcanturk00@gmail.com

Collaborateur·rice alumni

Enrique Fita Sanmartin

Collaborateur·rice alumni - UdeM

Kameron Harris

Collaborateur·rice de recherche - Western Washington University (faculty; assistant prof))

Co-superviseur⋅e :

Doctorat - UdeM

Will Hua

Collaborateur·rice alumni - McGill

Xiaolong Huang

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Paul Janson

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Charles-Etienne Joseph

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

Eugene Belilovsky

geraldin.nanfack@mila.quebec

M. Elyes Kanoun

Stagiaire de recherche - UdeM

Vincent Létourneau

Postdoctorat - UdeM

Myriam Lizotte

Doctorat - UdeM

Philippe Martin

Doctorat - UdeM

Co-superviseur⋅e :

Paul François

Paria Mehrbod

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Eugene Belilovsky

Lydia Mezrag

Doctorat - UdeM

Sacha Morin

Doctorat - UdeM

Co-superviseur⋅e :

Postdoctorat - Concordia

Superviseur⋅e principal⋅e :

Eugene Belilovsky

Amine Natik

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Guillaume Lajoie

Shuang Ni

Doctorat - UdeM

stephanie.zandee@mcgill.ca

Albert Orozco Camacho

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Matthew Scicluna

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Stagiaire de recherche - Western Washington University

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Collaborateur·rice de recherche - McGill (assistant professor)

Analyser le paradoxe des interférons inhérent à la COVID-19 au moyen de la réduction de la dimensionnalité et du regroupement

Billets de blogue

Graph and representation of working methodology, and graph of data on deaths 60 days after onset of symptoms.

19 février 2025

par

Sacha Morin

Elsa Brunet-Ratnasingham

Guy Wolf

Lire l'article

Publications

Effective Protein-Protein Interaction Exploration with PPIretrieval

Chenqing Hua

Connor W. Coley

Doina Precup

Shuangjia Zheng

Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, an… (voir plus)d immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.

2024-02-06

ArXiv (prépublication)

Effective Protein-Protein Interaction Exploration with PPIretrieval

Chenqing Hua

Connor W. Coley

Doina Precup

Shuangjia Zheng

2024-02-06

ArXiv (prépublication)

Effective Protein-Protein Interaction Exploration with PPIretrieval

Chenqing Hua

Connor Coley

Doina Precup

Shuangjia Zheng

2024-02-06

ArXiv (prépublication)

Gaining Biological Insights through Supervised Data Visualization

Jake S. Rhodes

Adrien Aumon

Sacha Morin

Marc Girard

Catherine Larochelle

Boaz Lahav

Elsa Brunet-Ratnasingham

Amélie Pagliuzza

Lorie Marchitto

Wei Zhang

Adele Cutler

F. Grand'Maison

Anhong Zhou

Andrés Finzi

Nicolas Chomont

Daniel E. Kaufmann

Stephanie Zandee

Alexandre Prat

Kevin R. Moon

Dimensionality reduction-based data visualization is pivotal in comprehending complex biological data. The most common methods, such as PHAT… (voir plus)E, t-SNE, and UMAP, are unsupervised and therefore reflect the dominant structure in the data, which may be independent of expert-provided labels. Here we introduce a supervised data visualization method called RF-PHATE, which integrates expert knowledge for further exploration of the data. RF-PHATE leverages random forests to capture intricate featurelabel relationships. Extracting information from the forest, RF-PHATE generates low-dimensional visualizations that highlight relevant data relationships while disregarding extraneous features. This approach scales to large datasets and applies to classification and regression. We illustrate RF-PHATE’s prowess through three case studies. In a multiple sclerosis study using longitudinal clinical and imaging data, RF-PHATE unveils a sub-group of patients with non-benign relapsingremitting Multiple Sclerosis, demonstrating its aptitude for time-series data. In the context of Raman spectral data, RF-PHATE effectively showcases the impact of antioxidants on diesel exhaust-exposed lung cells, highlighting its proficiency in noisy environments. Furthermore, RF-PHATE aligns established geometric structures with COVID-19 patient outcomes, enriching interpretability in a hierarchical manner. RF-PHATE bridges expert insights and visualizations, promising knowledge generation. Its adaptability, scalability, and noise tolerance underscore its potential for widespread adoption.

2024-01-21

bioRxiv (prépublication)

Gaining Biological Insights through Supervised Data Visualization

Jake S. Rhodes

Adrien Aumon

Sacha Morin

Marc Girard

Catherine Larochelle

Elsa Brunet-Ratnasingham

Amélie Pagliuzza

Lorie Marchitto

Wei Zhang

Adele Cutler

F. Grand'Maison

Anhong Zhou

Andrés Finzi

Nicolas Chomont

Daniel E. Kaufmann

Stephanie Zandee

Alexandre Prat

Kevin R. Moon

2024-01-21

bioRxiv (prépublication)

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

Dominique Beaini

Shenyang Huang

Joao Alex Cunha

Zhiyi Li

Gabriela Moisescu-Pareja

Oleksandr Dymov

Samuel Maddrell-Mander

Callum McLean

Frederik Wenkel

Luis Müller

Jama Hussein Mohamud

Ali Parviz

Michael Craig

Michał Koziarski

Jiarui Lu

Zhaocheng Zhu

Cristian Gabellini

Kerstin Klaser

Josef Dean

Cas Wognum … (voir 15 de plus)

Maciej Sypetkowski

Guillaume Rabusseau

Reihaneh Rabbany

Jian Tang

Christopher Morris

Ioannis Koutis

Mirco Ravanelli

Prudencio Tossou

Hadrien Mary

Therence Bois

Andrew William Fitzgibbon

Blazej Banaszewski

Chad Martin

Dominic Masters

Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, wh… (voir plus)ere datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. The Graphium library is publicly available on Github and the dataset links are available in Part 1 and Part 2.

2024-01-16

ICLR.cc/2024/Conference (poster)

Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy

Danqi Liao

Chen Liu

Benjamin W Christensen

Alexander Tong

Guillaume Huguet

Maximilian Nickel

Ian Adelstein

Smita Krishnaswamy

Entropy and mutual information in neural networks provide rich information on the learning process, but they have proven difficult to comput… (voir plus)e reliably in high dimensions. Indeed, in noisy and high-dimensional data, traditional estimates in ambient dimensions approach a fixed entropy and are prohibitively hard to compute. To address these issues, we leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. Specifically, we define diffusion spectral entropy (DSE) in neural representations of a dataset as well as diffusion spectral mutual information (DSMI) between different variables representing data. First, we show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data that outperform classic Shannon entropy, nonparametric estimation, and mutual information neural estimation (MINE). We then study the evolution of representations in classification networks with supervised learning, self-supervision, or overfitting. We observe that (1) DSE of neural representations increases during training; (2) DSMI with the class label increases during generalizable learning but stays stagnant during overfitting; (3) DSMI with the input signal shows differing trends: on MNIST it increases, while on CIFAR-10 and STL-10 it decreases. Finally, we show that DSE can be used to guide better network initialization and that DSMI can be used to predict downstream classification accuracy across 962 models on ImageNet.

2024-01-01

CISS (publié)

Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension

Shuang Ni

Adrien Aumon

Kevin R. Moon

Jake S. Rhodes

The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Com… (voir plus)mon dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.

2024-01-01

MLSP (publié)

Learnable Filters for Geometric Scattering Modules

Alexander Tong

Frederik Wenkel

Dhananjay Bhaskar

Kincaid MacDonald

Jackson Grady

Michael Perlmutter

Smita Krishnaswamy

2024-01-01

IEEE Transactions on Signal Processing (publié)

Simulation-Free Schrödinger Bridges via Score and Flow Matching

Alexander Tong

Nikolay Malkin

Kilian FATRAS

Lazar Atanackovic

Yanlei Zhang

Guillaume Huguet

Yoshua Bengio

We present simulation-free score and flow matching ([SF]…

2024-01-01

AISTATS (publié)

Spectral Temporal Contrastive Learning

Sacha Morin

Somjit Nath

Samira Ebrahimi Kahou

Learning useful data representations without requiring labels is a cornerstone of modern deep learning. Self-supervised learning methods, pa… (voir plus)rticularly contrastive learning (CL), have proven successful by leveraging data augmentations to define positive pairs. This success has prompted a number of theoretical studies to better understand CL and investigate theoretical bounds for downstream linear probing tasks. This work is concerned with the temporal contrastive learning (TCL) setting where the sequential structure of the data is used instead to define positive pairs, which is more commonly used in RL and robotics contexts. In this paper, we adapt recent work on Spectral CL to formulate Spectral Temporal Contrastive Learning (STCL). We discuss a population loss based on a state graph derived from a time-homogeneous reversible Markov chain with uniform stationary distribution. The STCL loss enables to connect the linear probing performance to the spectral properties of the graph, and can be estimated by considering previously observed data sequences as an ensemble of MCMC chains.

2023-12-01

ArXiv (prépublication)

Inferring dynamic regulatory interaction graphs from time series data with perturbations

Dhananjay Bhaskar

Daniel Sumner Magruder

Edward De Brouwer

Matheo Morales

Aarthi Venkat

Frederik Wenkel

Smita Krishnaswamy

2023-11-18

logconference.io/LOG/2023/Conference (poster)