Publications

Causal single-cell RNA-seq simulation, in silico perturbation, and GRN inference benchmarking using GRouNdGAN-Toolkit
Development of a defacing algorithm to protect the privacy of head and neck cancer patients in publicly-accessible radiotherapy datasets
Kayla O'Sullivan‐Steben
Luc Galarneau
Pixels Under Pressure: Exploring Fine-Tuning Paradigms for Foundation Models in High-Resolution Medical Imaging
Zahra Tehrani Nasab
Advancements in diffusion-based foundation models have improved text-to-image generation, yet most efforts have been limited to low-resoluti… (see more)on settings. As high-resolution image synthesis becomes increasingly essential for various applications, particularly in medical imaging domains, fine-tuning emerges as a crucial mechanism for adapting these powerful pre-trained models to task-specific requirements and data distributions. In this work, we present a systematic study, examining the impact of various fine-tuning techniques on image generation quality when scaling to high resolution 512x512 pixels. We benchmark a diverse set of fine-tuning methods, including full fine-tuning strategies and parameter-efficient fine-tuning (PEFT). We dissect how different fine-tuning methods influence key quality metrics, including Fr\'echet Inception Distance (FID), Vendi score, and prompt-image alignment. We also evaluate the utility of generated images in a downstream classification task under data-scarce conditions, demonstrating that specific fine-tuning strategies improve both generation fidelity and downstream performance when synthetic images are used for classifier training and evaluation on real images. Our code is accessible through the project website - https://tehraninasab.github.io/PixelUPressure/.
Pixels Under Pressure: Exploring Fine-Tuning Paradigms for Foundation Models in High-Resolution Medical Imaging
Zahra Tehrani Nasab
Advancements in diffusion-based foundation models have improved text-to-image generation, yet most efforts have been limited to low-resoluti… (see more)on settings. As high-resolution image synthesis becomes increasingly essential for various applications, particularly in medical imaging domains, fine-tuning emerges as a crucial mechanism for adapting these powerful pre-trained models to task-specific requirements and data distributions. In this work, we present a systematic study, examining the impact of various fine-tuning techniques on image generation quality when scaling to high resolution 512x512 pixels. We benchmark a diverse set of fine-tuning methods, including full fine-tuning strategies and parameter-efficient fine-tuning (PEFT). We dissect how different fine-tuning methods influence key quality metrics, including Fr\'echet Inception Distance (FID), Vendi score, and prompt-image alignment. We also evaluate the utility of generated images in a downstream classification task under data-scarce conditions, demonstrating that specific fine-tuning strategies improve both generation fidelity and downstream performance when synthetic images are used for classifier training and evaluation on real images. Our code is accessible through the project website - https://tehraninasab.github.io/PixelUPressure/.
Field-level Comparison and Robustness Analysis of Cosmological <i>N</i>-body Simulations
Adrian E. Bayer
Francisco Villaescusa-navarro
Sammy Sharief
Romain Teyssier
Lehman H. Garrison
Greg L. Bryan
Marco Gatti
Eli Visbal
Proceedings of the OHBM Open Science Room 2024
Selma Lugtmeijer
Ju-Chi Yu
Xiangzhen Kong
Janine D. Bijsterbosch
Elizabeth DuPre
Oscar Esteban
Ibrahim Faye
Seok-Jun Hong
Chuan-Peng Hu
Shella Keilholz
Chun-Chia Kung
Hyeong Hun Lee
Daniel Margulies
Cyril Pernet
Franco Pestilli
Jean-Baptiste Poline
Pradeep R. Raamana
Francesco Santini
Won Mok Shim … (see 30 more)
Paul M. Thompson
Chao-Gan Yan
Niall W. Duncan
Nikhil Bhagwat
Peter Fox
Ana Van Gulick
David N. Kennedy
Gorana Pobric
Neda Sadeghi
Nick Souter
Sandeep Panta
Isabelle van der Velpen
Tonya White
Sina Mansour L.
Qing Wang
Povilas Karvelis
Anibal S. Heinsfeld
Yu-Fang Yang
Hong Ji Kim
Nur Shahidatul Nabila Binti Ibrahim
Stefano Moia
Wei Zhang
Jessica Haigh
Rose-Marie Kouwenhoven
Terra Hyun Lee
Hurshitha Vasudevan
Yuping Yang
Subapriya Suppiah
Yi-Ju Lee
Nils Muhlert
MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training
Muhammad Osama Zeeshan
Natacha Gillet
Alessandro Lameiras Koerich
Francois Bremond
Eric Granger
Personalized expression recognition (ER) involves adapting a machine learning model to subject-specific data for improved recognition of exp… (see more)ressions with considerable interpersonal variability. Subject-specific ER can benefit significantly from multi-source domain adaptation (MSDA) methods, where each domain corresponds to a specific subject, to improve model accuracy and robustness. Despite promising results, state-of-the-art MSDA approaches often overlook multimodal information or blend sources into a single domain, limiting subject diversity and failing to explicitly capture unique subject-specific characteristics. To address these limitations, we introduce MuSACo, a multi-modal subject-specific selection and adaptation method for ER based on co-training. It leverages complementary information across multiple modalities and multiple source domains for subject-specific adaptation. This makes MuSACo particularly relevant for affective computing applications in digital health, such as patient-specific assessment for stress or pain, where subject-level nuances are crucial. MuSACo selects source subjects relevant to the target and generates pseudo-labels using the dominant modality for class-aware learning, in conjunction with a class-agnostic loss to learn from less confident target samples. Finally, source features from each modality are aligned, while only confident target features are combined. Our experimental results on challenging multimodal ER datasets: BioVid and StressID, show that MuSACo can outperform UDA (blending) and state-of-the-art MSDA methods.
Rethinking Safety in LLM Fine-tuning: An Optimization Perspective
Minseon Kim
Jin Myung Kwak
Lama Alssum
Bernard Ghanem
Philip Torr
Fazl Barez
Adel Bibi
Fine-tuning language models is commonly believed to inevitably harm their safety, i.e., refusing to respond to harmful user requests, even w… (see more)hen using harmless datasets, thus requiring additional safety measures. We challenge this belief through systematic testing, showing that poor optimization choices, rather than inherent trade-offs, often cause safety problems, measured as harmful responses to adversarial prompts. By properly selecting key training hyper-parameters, e.g., learning rate, batch size, and gradient steps, we reduce unsafe model responses from 16\% to approximately 5\%, as measured by keyword matching, while maintaining utility performance. Based on this observation, we propose a simple exponential moving average (EMA) momentum technique in parameter space that preserves safety performance by creating a stable optimization path and retains the original pre-trained model's safety properties. Our experiments on the Llama families across multiple datasets (Dolly, Alpaca, ORCA) demonstrate that safety problems during fine-tuning can largely be avoided without specialized interventions, outperforming existing approaches that require additional safety data while offering practical guidelines for maintaining both model performance and safety during adaptation.
SandboxSocial: A Sandbox for Social Media Using Multimodal AI Agents
Gayatri K
Busra Tugce Gurbuz
Austin Welch
Hao Yu
Ethan Kosak-Hine
Tom Gibbs
Dan Zhao
The online information ecosystem enables influence campaigns of unprecedented scale and impact. We urgently need empirically grounded approa… (see more)ches to counter the growing threat of malicious campaigns, now amplified by generative AI. But, developing defenses in real-world settings is impractical. Social system simulations with agents modelled using Large Language Models (LLMs) are a promising alternative approach and a growing area of research. However, existing simulators lack features needed to capture the complex information-sharing dynamics of platform-based social networks. To bridge this gap, we present SandboxSocial, a new simulator that includes several key innovations, mainly: (1) a virtual social media platform (modelled as Mastodon and mirrored in an actual Mastodon server) that enables a realistic setting in which agents interact; (2) an adapter that uses real-world user data to create more grounded agents and social media content; and (3) multi-modal capabilities that enable our agents to interact using both text and images---just as humans do on social media. We make the simulator more useful to researchers by providing measurement and analysis tools that track simulation dynamics and compute evaluation metrics to compare experimental results.
SandboxSocial: A Sandbox for Social Media Using Multimodal AI Agents
Gayatri Krishnakumar
Busra Tugce Gurbuz
Austin Welch
Hao Yu
Ethan Kosak-Hine
Tom Gibbs
Dan Zhao
Veracity: An Open-Source AI Fact-Checking System.
William Garneau
Manon Gruaz
Li Wei Wang
Sukanya Krishna
Luda Cohen
Veracity: An Open-Source AI Fact-Checking System
William Garneau
Manon Gruaz
Li Wei Wang
Sukanya Krishna
Luda Cohen
The proliferation of misinformation poses a significant threat to society, exacerbated by the capabilities of generative AI. This demo paper… (see more) introduces Veracity, an open-source AI system designed to empower individuals to combat misinformation through transparent and accessible fact-checking. Veracity leverages the synergy between Large Language Models (LLMs) and web retrieval agents to analyze user-submitted claims and provide grounded veracity assessments with intuitive explanations. Key features include multilingual support, numerical scoring of claim veracity, and an interactive interface inspired by familiar messaging applications. This paper will showcase Veracity's ability to not only detect misinformation but also explain its reasoning, fostering media literacy and promoting a more informed society.