Publications

Sound and Modular Activity Analysis for Automatic Differentiation in MLIR
Mai Jacob Peng
William S. Moses
Oleksandr Zinenko
Sound and Modular Activity Analysis for Automatic Differentiation in MLIR
Mai Jacob Peng
William S. Moses
Oleksandr Zinenko
Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models
David Layden
Ryan Sweke
Vojtvech Havl'ivcek
Anirban Chowdhury
Flow models are a cornerstone of modern machine learning. They are generative models that progressively transform probability distributions … (see more)according to learned dynamics. Specifically, they learn a continuous-time Markov process that efficiently maps samples from a simple source distribution into samples from a complex target distribution. We show that these models are naturally related to the Schr\"odinger equation, for an unusual Hamiltonian on continuous variables. Moreover, we prove that the dynamics generated by this Hamiltonian can be efficiently simulated on a quantum computer. Together, these results give a quantum algorithm for preparing coherent encodings (a.k.a., qsamples) for a vast family of probability distributions--namely, those expressible by flow models--by reducing the task to an existing classical learning problem, plus Hamiltonian simulation. For statistical problems defined by flow models, such as mean estimation and property testing, this enables the use of quantum algorithms tailored to qsamples, which may offer advantages over classical algorithms based only on samples from a flow model. More broadly, these results reveal a close connection between state-of-the-art machine learning models, such as flow matching and diffusion models, and one of the main expected capabilities of quantum computers: simulating quantum dynamics.
Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models
David Layden
Ryan Sweke
Vojtvech Havl'ivcek
Anirban Chowdhury
Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease
Roozbeh Sattari
Karine Marcotte
Carla Di Gironimo
Madeleine Sharp
Liziane Bouvier
Maiya Geddes
Ingrid Verduyckt
'Etienne de Villers-Sidani
Denise Klein
Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease
Peter William VanHarn Plantinga
Roozbeh Sattari
Karine Marcotte
Carla Di Gironimo
Madeleine Sharp
Liziane Bouvier
Maiya Geddes
Ingrid Verduyckt
'Etienne de Villers-Sidani
Denise Klein
The speech of people with Parkinson's Disease (PD) has been shown to hold important clues about the presence and progression of the disease.… (see more) We investigate the factors based on which humans experts make judgments of the presence of disease in speech samples over five different speech tasks: phonations, sentence repetition, reading, recall, and picture description. We make comparisons by conducting listening tests to determine clinicians accuracy at recognizing signs of PD from audio alone, and we conduct experiments with a machine learning system for detection based on Whisper. Across tasks, Whisper performs on par or better than human experts when only audio is available, especially on challenging but important subgroups of the data: younger patients, mild cases, and female patients. Whisper's ability to recognize acoustic cues in difficult cases complements the multimodal and contextual strengths of human experts.
High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization
Masih Aminbeidokhti
Heitor Rapela Medeiros
Srikanth Muralidharan
Eric Granger
Ensembling fine-tuned models initialized from powerful pre-trained weights is a common strategy to improve robustness under distribution shi… (see more)fts, but it comes with substantial computational costs due to the need to train and store multiple models. Dropout offers a lightweight alternative by simulating ensembles through random neuron deactivation; however, when applied to pre-trained models, it tends to over-regularize and disrupt critical representations necessary for generalization. In this work, we investigate Mixout, a stochastic regularization technique that provides an alternative to Dropout for domain generalization. Rather than deactivating neurons, Mixout mitigates overfitting by probabilistically swapping a subset of fine-tuned weights with their pre-trained counterparts during training, thereby maintaining a balance between adaptation and retention of prior knowledge. Our study reveals that achieving strong performance with Mixout on domain generalization benchmarks requires a notably high masking probability of 0.9 for ViTs and 0.8 for ResNets. While this may seem like a simple adjustment, it yields two key advantages for domain generalization: (1) higher masking rates more strongly penalize deviations from the pre-trained parameters, promoting better generalization to unseen domains; and (2) high-rate masking substantially reduces computational overhead, cutting gradient computation by up to 45% and gradient memory usage by up to 90%. Experiments across five domain generalization benchmarks, PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, using ResNet and ViT architectures, show that our approach, High-rate Mixout, achieves out-of-domain accuracy comparable to ensemble-based methods while significantly reducing training costs.
High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization
Masih Aminbeidokhti
Heitor Rapela Medeiros
Srikanth Muralidharan
Eric Granger
High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization
Masih Aminbeidokhti
Heitor Rapela Medeiros
Eric Granger
Ensembling fine-tuned models initialized from powerful pre-trained weights is a common strategy to improve robustness under distribution shi… (see more)fts, but it comes with substantial computational costs due to the need to train and store multiple models. Dropout offers a lightweight alternative by simulating ensembles through random neuron deactivation; however, when applied to pre-trained models, it tends to over-regularize and disrupt critical representations necessary for generalization. In this work, we investigate Mixout, a stochastic regularization technique that provides an alternative to Dropout for domain generalization. Rather than deactivating neurons, Mixout mitigates overfitting by probabilistically swapping a subset of fine-tuned weights with their pre-trained counterparts during training, thereby maintaining a balance between adaptation and retention of prior knowledge. Our study reveals that achieving strong performance with Mixout on domain generalization benchmarks requires a notably high masking probability of 0.9 for ViTs and 0.8 for ResNets. While this may seem like a simple adjustment, it yields two key advantages for domain generalization: (1) higher masking rates more strongly penalize deviations from the pre-trained parameters, promoting better generalization to unseen domains; and (2) high-rate masking substantially reduces computational overhead, cutting gradient computation by up to 45% and gradient memory usage by up to 90%. Experiments across five domain generalization benchmarks, PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, using ResNet and ViT architectures, show that our approach, High-rate Mixout, achieves out-of-domain accuracy comparable to ensemble-based methods while significantly reducing training costs.
Online HD-tRNS over the right temporoparietal junction modulates social inference but not motor coordination
Quentin Moreau
Vincent Chamberland
Lisane Moses
Gabriela Milanova
Online HD-tRNS Over the Right Temporoparietal Junction Modulates Social Inference But Not Motor Coordination
Quentin Moreau
Vincent Chamberland
Lisane Moses
Gabriela Milanova
Revisiting Mixout: An Overlooked Path to Robust Finetuning
Masih Aminbeidokhti
Heitor Rapela Medeiros
Eric Granger
Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revis… (see more)it Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the \emph{masking anchor}, \emph{resampling frequency}, and \emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.