Portrait of Shuang Ni

Shuang Ni

PhD - Université de Montréal
Supervisor
Research Topics
Computational Biology
Dimensionality Reduction Methods
Medical Machine Learning

Publications

Path-independent Flow Matching for Multi-parameter Generative Dynamics
Flow Matching is a powerful framework for learning transport maps between probability distributions. Yet its standard single-parameter formu… (see more)lation is not designed to capture multi-parameter variations where the resulting transport should be path-independent. Path independence is crucial because it ensures that transformations depend only on the initial and target distributions, not on the specific path. In this work, we introduce Path-independent Flow Matching (PiFM), a method for learning vector fields whose induced flows yield path-independent transport between distributions. We show that PiFM generalizes Flow Matching to higher-dimensional parameter domains while enforcing structural conditions that ensure consistency of composed transformations. In addition, we show that, under suitable assumptions, PiFM approximates the Wasserstein barycenter, linking the framework to a notion of distributional interpolation. To enable practical training, we propose a tractable, simulation-free objective that regresses onto multi-parameter conditional probability paths. We showcase empirically that PiFM outperforms other approaches on both synthetic and real world data in interpolating path-independent trajectories and generating desired out of distribution samples.
Random Forest Autoencoders for Guided Representation Learning
Kevin R. Moon
Jake S. Rhodes
Extensive research has produced robust methods for unsupervised data visualization. Yet supervised visualization…
Measure Before You Look: Grounding Embeddings Through Manifold Metrics
Enhancing Supervised Visualization Through Autoencoder and Random Forest Proximities for Out-of-Sample Extension
Kevin R. Moon
Jake S. Rhodes
The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Com… (see more)mon dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.