Publications

Improving Quality Control of MRI Images Using Synthetic Motion Data
C Bricout
K Cho
M Harms
O Pasternak
C Bearden
PD McGorry
RS Kahn
JM Kane
B Nelson
SW Woods
ME Shenton
S Bouix
S Ebrahimi Kahou
MRI quality control (QC) is challenging due to unbalanced and limited datasets, as well as subjective scoring, which hin-der the development… (voir plus) of reliable automated QC systems. To address these issues, we introduce an approach that pretrains a model on synthetically generated motion artifacts before applying transfer learning for QC classification. This method not only improves the accuracy in identifying poor-quality scans but also reduces training time and resource requirements compared to training from scratch. By leveraging syn-thetic data, we provide a more robust and resource-efficient solution for QC automation in MRI, paving the way for broader adoption in diverse research settings.
Logging Requirement for Continuous Auditing of Responsible Machine Learning-based Applications
Patrick Loic Foalem
Leuson Da Silva
Heng Li
Ettore Merlo
Machine learning (ML) is increasingly applied across industries to automate decision-making, but concerns about ethical and legal compliance… (voir plus) remain due to limited transparency, fairness, and accountability. Monitoring through logging a long-standing practice in traditional software offers a potential means for auditing ML applications, as logs provide traceable records of system behavior useful for debugging, performance analysis, and continuous auditing. systematically auditing models for compliance or accountability. The findings underscore the need for enhanced logging practices and tooling that systematically integrate responsible AI metrics. Such practices would support the development of auditable, transparent, and ethically responsible ML systems, aligning with growing regulatory requirements and societal expectations. By highlighting specific deficiencies and opportunities, this work provides actionable guidance for both practitioners and tool developers seeking to strengthen the accountability and trustworthiness of ML applications.
Open Problems in Technical AI Governance
Anka Reuel
Benjamin Bucknall
Stephen Casper
Timothy Fist
Lisa Soder
Onni Aarne
Lewis Hammond
Lujain Ibrahim
Peter Wills
Markus Anderljung
Ben Garfinkel
Lennart Heim
Andrew Trask
Gabriel Mukobi
Rylan Schaeffer
Mauricio Baker
Sara Hooker
Irene Solaiman
Alexandra Luccioni
Nicolas Moës
Jeffrey Ladish
David Bau
Paul Bricman
Neel Guha
Jessica Newman
Tobin South
Alex Pentland
Sanmi Koyejo
Mykel Kochenderfer
Robert Trager
AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the… (voir plus) barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technical analysis and tools for supporting the effective governance of AI, seeks to address such challenges. It can help to (a) identify areas where intervention is needed, (b) identify and assess the efficacy of potential governance actions, and (c) enhance governance options by designing mechanisms for enforcement, incentivization, or compliance. In this paper, we explain what technical AI governance is, why it is important, and present a taxonomy and incomplete catalog of its open problems. This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance.
Predicting College Enrollment for Low-Socioeconomic-Status Students Using Machine Learning Approaches
Surina He
Mehrdad Yousefpoori-Naeim
Ying Cui
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Maksim Kuznetsov
Roman Schutski
Shayakhmetov Rim
Daniil Polykovskiy
A. Chandar
Alex Zhavoronkov
Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding … (voir plus)of the complex physical interactions between the molecule and its environment. In this paper, we present a novel generative model, BindGPT which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our model produces molecular graphs and conformations jointly, eliminating the need for an extra graph reconstruction step. We pretrain BindGPT on a large-scale dataset and fine-tune it with reinforcement learning using scores from external simulation software. We demonstrate how a single pretrained language model can serve at the same time as a 3D molecular generative model, conformer generator conditioned on the molecular graph, and a pocket-conditioned 3D molecule generator. Notably, the model does not make any representational equivariance assumptions about the domain of generation. We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models, language models, and graph neural networks while being two orders of magnitude cheaper to sample.
Designing Ambiguity Sets for Distributionally Robust Optimization Using Structural Causal Optimal Transport
Ahmad-reza Ehyaei
Samira Samadi
Distributionally robust optimization tackles out-of-sample issues like overfitting and distribution shifts by adopting an adversarial approa… (voir plus)ch over a range of possible data distributions, known as the ambiguity set. To balance conservatism and accuracy, these sets must include realistic probability distributions by leveraging information from the nominal distribution. Assuming that nominal distributions arise from a structural causal model with a directed acyclic graph
Engineering TCR-controlled fuzzy logic into CAR T cells enhances therapeutic specificity
Taisuke Kondo
François X.P. Bourassa
Sooraj Achar
MyLinh T. Duong
Anirvan Ghosh
Jérémy Biton
Grégoire Altan-Bonnet
Naomi Taylor
A Layer Selection Approach to Test Time Adaptation
Mostafa Elaraby
Yann Batiste Pequignot
Frédéric Precioso
Test Time Adaptation (TTA) addresses the problem of distribution shift by adapting a pretrained model to a new domain during inference. When… (voir plus) faced with challenging shifts, most methods collapse and perform worse than the original pretrained model. In this paper, we find that not all layers are equally receptive to the adaptation, and the layers with the most misaligned gradients often cause performance degradation. To address this, we propose GALA, a novel layer selection criterion to identify the most beneficial updates to perform during test time adaptation. This criterion can also filter out unreliable samples with noisy gradients. Its simplicity allows seamless integration with existing TTA loss functions, thereby preventing degradation and focusing adaptation on the most trainable layers. This approach also helps to regularize adaptation to preserve the pretrained features, which are crucial for handling unseen domains. Through extensive experiments, we demonstrate that the proposed layer selection framework improves the performance of existing TTA approaches across multiple datasets, domain shifts, model architectures, and TTA losses.
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan A. Rodriguez
Issam H. Laradji
Juan A. Rodriguez
Sai Rajeswar
Christopher Pal
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation metho… (voir plus)ds have focused on curve-based vectorization, lacking semantic understanding, often producing artifacts, and struggling with SVG primitives beyond path curves. To address these issues, we introduce StarVector, a multimodal large language model for SVG generation. It performs image vectorization by understanding image semantics and using SVG primitives for compact, precise outputs. Unlike traditional methods, StarVector works directly in the SVG code space, leveraging visual understanding to apply accurate SVG primitives. To train StarVector, we create SVG-Stack, a diverse dataset of 2M samples that enables generalization across vectorization tasks and precise use of primitives like ellipses, polygons, and text. We address challenges in SVG evaluation, showing that pixel-based metrics like MSE fail to capture the unique qualities of vector graphics. We introduce SVG-Bench, a benchmark across 10 datasets, and 3 tasks: Image-to-SVG, Text-to-SVG generation, and diagram generation. Using this setup, StarVector achieves state-of-the-art performance, producing more compact and semantically rich SVGs.
Genetic modulation of brain dynamics in neurodevelopmental disorders: the impact of copy number variations on resting-state EEG
Adrien E. E. Dubois
Elisabeth Audet-Duchesne
Inga Sophia Knoth
Charles-Olivier Martin
Khadije Jizi
Petra Tamer
Nadine Younis
Sébastien Jacquemont
Sarah Lippé
Research has shown that many copy number variations (CNVs) increase the risk of neurodevelopmental disorders (e.g., autism, ADHD, schizophre… (voir plus)nia). However, little is known about the effects of CNVs on brain development and function. Resting-state electroencephalography (EEG) is a suitable method to study the disturbances of neuronal functioning in CNVs. We aimed to determine whether there are resting-state EEG signatures that are characteristic of children with pathogenic CNVs. EEG resting-state brain activity of 109 CNV carriers (66 deletion carriers, 43 duplication carriers) aged 3 to 17 years was recorded for 4 minutes. To better account for developmental variations, EEG indices (power spectral density and functional connectivity) were corrected with a normative model estimated from 256 Healthy Brain Network controls. Results showed a decreased exponent of the aperiodic activity and a reduced alpha peak frequency in CNV carriers. Additionally, the study showed altered periodic components and connectivity in several frequency bands. Deletion and duplication carriers exhibited a similar overall pattern of deviations in spectral and connectivity measures, although the significance and effect sizes relative to the control group varied across frequency bands. Deletion and duplication carriers can be differentiated by their periodic power in the gamma band and connectivity in the low alpha band, with duplication carriers showing more disrupted alterations than deletion carriers. The distinctive alterations in spectral patterns were found to be most prominent during adolescence. The results suggest that CNV carriers show electrophysiological alterations compared to neurotypical controls, regardless of the gene dosage effect and their affected genomic region. At the same time, while duplications and deletions share common electrophysiological alterations, each exhibits distinct brain alteration signatures that reflect gene dosage-specific effects.
Echoes in the Noise: Posterior Samples of Faint Galaxy Surface Brightness Profiles with Score-based Likelihoods and Priors
Connor Bottrell
Laurence Perreaul-Levasseur
Examining the detailed structure of galaxy populations provides valuable insights into their formation and evolution mechanisms. Significant… (voir plus) barriers to such analysis are the nontrivial noise properties of real astronomical images and the point-spread function, which blurs structure. Here we present a framework which combines recent advances in score-based likelihood characterization and diffusion model priors to perform a Bayesian analysis of image deconvolution. The method, when applied to minimally processed Hubble Space Telescope data, recovers structures which have otherwise only become visible in next-generation James Webb Space Telescope imaging.
InfoGain Wavelets: Furthering the Design of Diffusion Wavelets for Graph-Structured Data
David R. Johnson
Michael Perlmutter
Diffusion wavelets extract information from graph signals at different scales of resolution by utilizing graph diffusion operators raised to… (voir plus) various powers, known as diffusion scales. Traditionally, the diffusion scales are chosen to be dyadic integers,