Publications

Employing Machine Learning to Predict Medical Trainees’ Psychophysiological Responses and Self- and Socially- Shared Regulated Learning Strategies While Completing Medical Simulations
Matthew Moreno
Keerat Grewal
Jason M. Harley
Enhancing STED Microscopy via Fluorescence Lifetime Unmixing and Filtering in Two-Species SPLIT-STED
Andréanne Deschênes
Antoine Ollier
Marie Lafontaine
Albert Michaud-Gagnon
Jeffrey-Gabriel Steavan Santiague
Anthony Bilodeau
Paul De Koninck
A pattern-learning algorithm associates copy number variations with brain structure and behavioural variables in an adolescent population cohort
Kuldeep Kumar
Zohra Saci
Martineau Jean-Louis
Xiaoqian J. Chai
Tian Ge
B. T. Thomas Yeo
Paul M. Thompson
Carrie E. Bearden
Ole A. Andreassen
Sébastien Jacquemont
Our genetic makeup, together with environmental and social influences, shape our brain's development. Yet, the imaging-genetics field has st… (see more)ruggled to integrate all these modalities to investigate the interplay between genetic blueprint, brain architecture, environment, human health and daily living skills. Here we interrogate the Adolescent Brain Cognitive Development (ABCD) cohort to outline the effects of rare high-effect genetic variants on brain architecture and their corresponding implications on cognitive, behavioural, psychosocial and socioeconomic traits. We design a holistic pattern-learning framework that quantitatively dissects the impacts of copy number variations (CNVs) on brain structure and 938 behavioural variables spanning 20 categories in 7,338 adolescents. Our results reveal associations between genetic alterations, higher-order brain networks and specific parameters of the family wellbeing, including increased parental and child stress, anxiety and depression, or neighbourhood dynamics such as decreased safety. We thus find effects extending beyond the impairment of cognitive ability or language capacity which have been previously reported. Our investigation spotlights the interplay between genetic variation and subjective life quality in adolescents and their families.
A pattern-learning algorithm associates copy number variations with brain structure and behavioural variables in an adolescent population cohort
Kuldeep Kumar
Zohra Saci
Martineau Jean-Louis
Xiaoqian J. Chai
Tian Ge
B. T. Thomas Yeo
Paul M. Thompson
Carrie E. Bearden
Ole A. Andreassen
Sébastien Jacquemont
Our genetic makeup, together with environmental and social influences, shape our brain's development. Yet, the imaging-genetics field has st… (see more)ruggled to integrate all these modalities to investigate the interplay between genetic blueprint, brain architecture, environment, human health and daily living skills. Here we interrogate the Adolescent Brain Cognitive Development (ABCD) cohort to outline the effects of rare high-effect genetic variants on brain architecture and their corresponding implications on cognitive, behavioural, psychosocial and socioeconomic traits. We design a holistic pattern-learning framework that quantitatively dissects the impacts of copy number variations (CNVs) on brain structure and 938 behavioural variables spanning 20 categories in 7,338 adolescents. Our results reveal associations between genetic alterations, higher-order brain networks and specific parameters of the family wellbeing, including increased parental and child stress, anxiety and depression, or neighbourhood dynamics such as decreased safety. We thus find effects extending beyond the impairment of cognitive ability or language capacity which have been previously reported. Our investigation spotlights the interplay between genetic variation and subjective life quality in adolescents and their families.
Quantification of head and neck cancer patients' anatomical changes during radiotherapy: Toward the prediction of replanning need
Odette Rios‐Ibacache
James Manalad
Kayla O'Sullivan‐Steben
Emily Poon
Luc Galarneau
Julia Khriguian
George Shenouda
J. Kildea
Abstract Background Head and neck cancer (HNC) patients undergoing radiotherapy (RT) may experience anatomical changes during treatment, whi… (see more)ch can compromise the validity of the initial treatment plan, necessitating replanning. However, ad hoc replanning disrupts clinical workflows and increases workload. Currently, no standardized method exists to quantify anatomical variation that necessitates replanning. Purpose This project aimed to create geometrical metrics to describe anatomical changes in HNC patients during RT. The usefulness of these metrics was evaluated by a univariate analysis and through machine learning (ML) models to predict the need for replanning. Methods A cohort of 150 HNC patients treated at McGill University Health Centre was analyzed. Based on the shapes of the RT structures (body, PTV, mandible, neck, and submandibular contours), we developed 43 metrics and automatically calculated them through a Python pipeline that we called HNGeoNatomyX. Univariate analysis using linear regression was conducted to obtain the rate of change of each metric. We also obtained the relative variation of each metric between the pre‐treatment and replanning‐requested scans. Fraction‐specific ML models (incorporating information available up to and including the specific fraction) for fractions 5, 10, and 15 were built using metrics, clinical data, and feature selection techniques. Model performance was estimated with repeated stratified 5‐fold cross‐validation resampling technique and the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Results Univariate analysis showed that body‐ and neck‐related metrics were most predictive of replanning need. Our best specific multivariate models for fractions 5, 10, and 15 yielded testing scores of 0.82, 0.70, and 0.79, respectively. Our models early predicted replanning for 76% of the true positives. Conclusions The created metrics have the potential to characterize and distinguish which patients will necessitate RT replanning. They show promise in guiding clinicians to evaluate RT replanning for HNC patients and streamline workflows.
Adaptive Resolution Residual Networks — Generalizing Across Resolutions Easily and Efficiently
The majority of signal data captured in the real world uses numerous sensors with different resolutions. In practice, most deep learning arc… (see more)hitectures are fixed-resolution; they consider a single resolution at training and inference time. This is convenient to implement but fails to fully take advantage of the diverse signal data that exists. In contrast, other deep learning architectures are adaptive-resolution; they directly allow various resolutions to be processed at training and inference time. This provides computational adaptivity but either sacrifices robustness or compatibility with mainstream layers, which hinders their use. In this work, we introduce Adaptive Resolution Residual Networks (ARRNs) to surpass this tradeoff. We construct ARRNs from Laplacian residuals, which serve as generic adaptive-resolution adapters for fixed-resolution layers. We use smoothing filters within Laplacian residuals to linearly separate input signals over a series of resolution steps. We can thereby skip Laplacian residuals to cast high-resolution ARRNs into low-resolution ARRNs that are computationally cheaper yet numerically identical over low-resolution signals. We guarantee this result when Laplacian residuals are implemented with perfect smoothing kernels. We complement this novel component with Laplacian dropout, which randomly omits Laplacian residuals during training. This regularizes for robustness to a distribution of lower resolutions. This also regularizes for numerical errors that may occur when Laplacian residuals are implemented with approximate smoothing kernels. We provide a solid grounding for the advantageous properties of ARRNs through a theoretical analysis based on neural operators, and empirically show that ARRNs embrace the challenge posed by diverse resolutions with computational adaptivity, robustness, and compatibility with mainstream layers.
Convergence of regularized agent-state based Q-learning in POMDPs
Matthieu Geist
In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practic… (see more)e. Two salient features of such algorithms are: (i) the Q-table is recursively updated using an agent state (such as the state of a recurrent neural network) which is not a belief state or an information state and (ii) policy regularization is often used to encourage exploration and stabilize the learning algorithm. We investigate the simplest form of such Q-learning algorithms which we call regularized agent-state based Q-learning (RASQL) and show that it converges under mild technical conditions to the fixed point of an appropriately defined regularized MDP, which depends on the stationary distribution induced by the behavioral policy. We also show that a similar analysis continues to work for a variant of RASQL that learns periodic policies. We present numerical examples to illustrate that the empirical convergence behavior matches with the proposed theoretical limit.
Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs
Shangmin Guo
Omar Darwiche Domingues
Raphaël Avalos
Tool use in stateful environments presents unique challenges for large language models (LLMs), where existing test-time compute strategies r… (see more)elying on repeated trials in the environment are impractical. We propose dynamics modelling (DyMo), a method that augments LLMs with a state prediction capability alongside function calling during post-training. This enables LLMs to predict the future states of their actions through an internal environment model. On the Berkeley Function Calling Leaderboard V2, DyMo improves success rates and significantly reduces hallucinations. We further integrate the internal environment model into self-verification sampling (SVS), and show that this substantially improves pass^k over number of trials k, and allows the model to refuse unreliable outputs. Together, DyMo and SVS greatly enhance the effectiveness and reliability of LLMs for tool use. We believe this work charts a path towards scalable planning RL methods for LLM inference without repeatedly querying the oracle environment.
Brain Age Prediction: Deep Models Need a Hand to Generalize
Reza Rajabli
Mahdie Soltaninejad
Vladimir S. Fonov
D. Louis Collins
Predicting brain age from T1‐weighted MRI is a promising marker for understanding brain aging and its associated conditions. While deep le… (see more)arning models have shown success in reducing the mean absolute error (MAE) of predicted brain age, concerns about robust and accurate generalization in new data limit their clinical applicability. The large number of trainable parameters, combined with limited medical imaging training data, contributes to this challenge, often resulting in a generalization gap where there is a significant discrepancy between model performance on training data versus unseen data. In this study, we assess a deep model, SFCN‐reg, based on the VGG‐16 architecture, and address the generalization gap through comprehensive preprocessing, extensive data augmentation, and model regularization. Using training data from the UK Biobank, we demonstrate substantial improvements in model performance. Specifically, our approach reduces the generalization MAE by 47% (from 5.25 to 2.79 years) in the Alzheimer's Disease Neuroimaging Initiative dataset and by 12% (from 4.35 to 3.75 years) in the Australian Imaging, Biomarker and Lifestyle dataset. Furthermore, we achieve up to 13% reduction in scan‐rescan error (from 0.80 to 0.70 years) while enhancing the model's robustness to registration errors. Feature importance maps highlight anatomical regions used to predict age. These results highlight the critical role of high‐quality preprocessing and robust training techniques in improving accuracy and narrowing the generalization gap, both necessary steps toward the clinical use of brain age prediction models. Our study makes valuable contributions to neuroimaging research by offering a potential pathway to improve the clinical applicability of deep learning models.
Longer scans boost prediction and cut costs in brain-wide association studies
Leon Qi Rong Ooi
Csaba Orban
Shaoshi Zhang
Thomas E. Nichols
Trevor Wei Kiat Tan
Ru Kong
Scott Marek
Nico U. F. Dosenbach
Timothy O. Laumann
Evan M. Gordon
Kwong Hsia Yap
Fang Ji
Joanna Su Xian Chong
Christopher Chen
Lijun An
Nicolai Franzmeier
Sebastian N. Roemer-Cassiano
Qingyu Hu
Jianxun Ren
Hesheng Liu … (see 9 more)
Sidhant Chopra
Carrisa V. Cocuzza
Justin T. Baker
Juan Helen Zhou
Simon B. Eickhoff
Avram J. Holmes
B. T. Thomas Yeo
Clifford R. Jack Jr
A pervasive dilemma in brain-wide association studies (BWAS) is whether to prioritize functional MRI (fMRI) scan time or sample size. We der… (see more)ive a theoretical model showing that individual-level phenotypic prediction accuracy increases with sample size and total scan duration (sample size × scan time per participant). The model explains empirical prediction accuracies extremely well across 76 phenotypes from nine resting-fMRI and task-fMRI datasets (R2 = 0.89), spanning a wide range of scanners, acquisitions, racial groups, disorders and ages. For scans ≤20 mins, prediction accuracy increases linearly with the logarithm of total scan duration, suggesting interchangeability of sample size and scan time. However, sample size is ultimately more important than scan time in determining prediction accuracy. Nevertheless, when accounting for overhead costs associated with each participant (e.g., recruitment costs), to boost prediction accuracy, longer scans can yield substantial cost savings over larger sample size. To achieve high prediction performance, 10-min scans are highly cost inefficient. In most scenarios, the optimal scan time is ≥20 mins. On average, 30-min scans are the most cost-effective, yielding 22% cost savings over 10-min scans. Overshooting is cheaper than undershooting the optimal scan time, so we recommend aiming for ≥30 mins. Compared with resting-state whole-brain BWAS, the most cost-effective scan time is shorter for task-fMRI and longer for subcortical-cortical BWAS. Standard power calculations maximize sample size at the expense of scan time. Our study demonstrates that optimizing both sample size and scan time can boost prediction power while cutting costs. Our empirically informed reference is available for future study planning: WEB_APPLICATION_LINK
Exact risk curves of signSGD in High-Dimensions: quantifying preconditioning and noise-compression effects
Ke Liang Xiao
Atish Agarwala
In recent years, signSGD has garnered interest as both a practical optimizer as well as a simple model to understand adaptive optimizers lik… (see more)e Adam. Though there is a general consensus that signSGD acts to precondition optimization and reshapes noise, quantitatively understanding these effects in theoretically solvable settings remains difficult. We present an analysis of signSGD in a high dimensional limit, and derive a limiting SDE and ODE to describe the risk. Using this framework we quantify four effects of signSGD: effective learning rate, noise compression, diagonal preconditioning, and gradient noise reshaping. Our analysis is consistent with experimental observations but moves beyond that by quantifying the dependence of these effects on the data and noise distributions. We conclude with a conjecture on how these results might be extended to Adam.
In-context learning and Occam's razor
A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees fo… (see more)r generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.