Publications

Manifold Filter-Combine Networks
Joyce Chew
Edward De Brouwer
Deanna Needell
Michael Perlmutter
In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). Our filter-combine fra… (voir plus)mework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as manifold analogues of various popular GNNs. We propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating an underlying manifold by a sparse graph. We then prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity, and we numerically demonstrate its effectiveness on real-world and synthetic data sets.
Single-nucleus chromatin accessibility profiling identifies cell types and functional variants contributing to major depression
Anjali Chawla
Laura M. Fiori
Wenmin Zang
Malosree Maitra
Jennie Yang
Dariusz Żurawek
Gabriella Frosi
Reza Rahimian
Haruka Mitsuhashi
Maria Antonietta Davoli
MA Davoli
Ryan Denniston
Gary Gang Chen
Volodymyr Yerko
Deborah Mash
Kiran Girdhar
Schahram Akbarian
Naguib Mechawar
Matthew Suderman … (voir 3 de plus)
Yuemei Li
Corina Nagy
Gustavo Turecki
Understanding In-Context Learning of Linear Models in Transformers Through an Adversarial Lens
Usman Anwar
Johannes Von Oswald
Louis Kirsch
David M. Krueger
Spencer Frei
In this work, we make two contributions towards understanding of in-context learning of linear models by transformers. First, we investigate… (voir plus) the adversarial robustness of in-context learning in transformers to hijacking attacks — a type of adversarial attacks in which the adversary’s goal is to manipulate the prompt to force the transformer to generate a specific output. We show that both linear transformers and transformers with GPT-2 architectures are vulnerable to such hijacking attacks. However, adversarial robustness to such attacks can be significantly improved through adversarial training --- done either at the pretraining or finetuning stage --- and can generalize to stronger attack models. Our second main contribution is a comparative analysis of adversarial vulnerabilities across transformer models and other algorithms for learning linear models. This reveals two novel findings. First, adversarial attacks transfer poorly between larger transformer models trained from different seeds despite achieving similar in-distribution performance. This suggests that transformers of the same architecture trained according to the same recipe may implement different in-context learning algorithms for the same task. Second, we observe that attacks do not transfer well between classical learning algorithms for linear models (single-step gradient descent and ordinary least squares) and transformers. This suggests that there could be qualitative differences between the in-context learning algorithms that transformers implement and these traditional algorithms.
CellForge: Agentic Design of Virtual Cell Models
Xiangru Tang
Zhuoyun Yu
Jiapeng Chen
Yan Cui
Yanjun Shao
Weixu Wang
Fang Wu
Yuchen Zhuang
Wenqi Shi
Zhi Huang
Arman Cohan
Xihong Lin
Fabian Theis
Mark B. Gerstein
Virtual cell modeling represents an emerging frontier at the intersection of artificial intelligence and biology, aiming to predict quantiti… (voir plus)es such as responses to diverse perturbations quantitatively. However, autonomously building computational models for virtual cells is challenging due to the complexity of biological systems, the heterogeneity of data modalities, and the need for domain-specific expertise across multiple disciplines. Here, we introduce CellForge, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. More specifically, given only raw single-cell multi-omics data and task descriptions as input, CellForge outputs both an optimized model architecture and executable code for training virtual cell models and inference. The framework integrates three core modules: Task Analysis for presented dataset characterization and relevant literature retrieval, Method Design, where specialized agents collaboratively develop optimized modeling strategies, and Experiment Execution for automated generation of code. The agents in the Design module are separated into experts with differing perspectives and a central moderator, and have to collaboratively exchange solutions until they achieve a reasonable consensus. We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets that encompass gene knockouts, drug treatments, and cytokine stimulations across multiple modalities. CellForge consistently outperforms task-specific state-of-the-art methods. Overall, CellForge demonstrates how iterative interaction between LLM agents with differing perspectives provides better solutions than directly addressing a modeling challenge. Our code is publicly available at https://github.com/gersteinlab/CellForge.
Infrared Object Detection with Ultra Small ConvNets: Is ImageNet Pretraining Still Useful?
Srikanth Muralidharan
Heitor Rapela Medeiros
Masih Aminbeidokhti
Eric Granger
Many real-world applications require recognition models that are robust to different operational conditions and modalities, but at the same … (voir plus)time run on small embedded devices, with limited hardware. While for normal size models, pre-training is known to be very beneficial in accuracy and robustness, for small models, that can be employed for embedded and edge devices, its effect is not clear. In this work, we investigate the effect of ImageNet pretraining on increasingly small backbone architectures (ultra-small models, with
Responsible AI Day
Ebrahim Bagheri
Faezeh Ensan
Calvin Hillis
Robin Cohen
Benjamin C. M. Fung
Sébastien Gambs
This special day event on Responsible Artificial Intelligence (AI) brings together researchers, practitioners, and policymakers to explore h… (voir plus)ow data mining and machine learning systems can be designed to align with ethical principles, societal values, and human well-being. As AI technologies increasingly influence decisions in healthcare, finance, governance, and social systems, there is a critical need to develop frameworks that embed fairness, accountability, and privacy directly into the foundations of knowledge discovery. This full-day event will feature a mix of invited talks, interactive debates, expert panels, and peer-reviewed research presentations, all focused on the practical integration of ethical design into data-driven systems. The Responsible AI Day builds on the success of Canada's NSERC CREATE Program on Responsible AI, an interdisciplinary initiative training the next generation of AI researchers across computer science, law, bioethics, public health, and media studies. Topics will span scalable AI governance, privacy-preserving computation, algorithmic bias mitigation, and the socio-legal tensions emerging in generative AI. By positioning responsible AI as a sociotechnical challenge, this special day aligns with KDD's mission of advancing data science that is not only technically robust but also socially conscious.
Temporal Graph Learning Workshop
Daniele Zambon
Andrea Cini
Michael M. Bronstein
A Unified Solution to Diverse Heterogeneities in One-Shot Federated Learning
Yiliao Song
Atul Sajjanhar
Yong Xiang
Wei Zhou
Xiaohui Tao
Yan Li
One-Shot Federated Learning (OSFL) restricts communication between the server and clients to a single round, significantly reducing communic… (voir plus)ation costs and minimizing privacy leakage risks compared to traditional Federated Learning (FL), which requires multiple rounds of communication. However, existing OSFL frameworks remain vulnerable to distributional heterogeneity, as they primarily focus on model heterogeneity while neglecting data heterogeneity. To bridge this gap, we propose FedHydra, a unified, data-free, OSFL framework designed to effectively address both model and data heterogeneity. Unlike existing OSFL approaches, FedHydra introduces a novel two-stage learning mechanism. Specifically, it incorporates model stratification and heterogeneity-aware stratified aggregation to mitigate the challenges posed by both model and data heterogeneity. By this design, the data and model heterogeneity issues are simultaneously monitored from different aspects during learning. Consequently, FedHydra can effectively mitigate both issues by minimizing their inherent conflicts. We compared FedHydra with five SOTA baselines on four benchmark datasets. Experimental results show that our method outperforms the previous OSFL methods in both homogeneous and heterogeneous settings. The code is available at https://github.com/Jun-B0518/FedHydra.
Inhibition of epithelial cell YAP-TEAD/LOX signaling attenuates pulmonary fibrosis in preclinical models
Darcy Elizabeth Wagner
Hani N. Alsafadi
Nilay Mitash
Aurelien Justet
Qianjiang Hu
Ricardo Pineda
Claudia Staab-Weijnitz
Martina Korfei
Nika Gvazava
Kristin Wannemo
Ugochi Onwuka
Molly Mozurak
Adriana Estrada-Bernal
Juan Cala Garcia
Katrin Mutze
Rita Costa
Deniz Bölükbas
John Stegmayr
Wioletta Skronska-Wasek
Stephan Klee … (voir 14 de plus)
Chiharu Ota
Hoeke A. Baarsma
Jingtao Wang
John Sembrat
Anne Hilgendorff
Andreas Günther
Rachel Chambers
Ivan O Rosas
Stijn de Langhe
Naftali Kaminski
Mareike Lehmann
Oliver Eickelberg
Melanie Königshoff
Idiopathic pulmonary fibrosis (IPF) is a progressive and lethal disease characterized by excessive extracellular matrix deposition. Current … (voir plus)IPF therapies slow disease progression but do not stop or reverse it. The (myo)fibroblasts are thought to be the main cellular contributors to excessive extracellular matrix production in IPF. Here we show that fibrotic alveolar type II cells regulate production and crosslinking of extracellular matrix via the co-transcriptional activator YAP. YAP leads to increased expression of Lysl oxidase (LOX) and subsequent LOX-mediated crosslinking by fibrotic alveolar type II cells. Pharmacological YAP inhibition via verteporfin reverses fibrotic alveolar type II cell reprogramming and LOX expression in experimental lung fibrosis in vivo and in human fibrotic tissue ex vivo. We thus identify YAP-TEAD/LOX inhibition in alveolar type II cells as a promising potential therapy for IPF patients.
Efficient Deep Reinforcement Learning-Based Supplementary Damping Control with a Coordinated RMS Training and EMT Testing Scheme
Tao Xue
Mingxuan Zhao
Ilhan Kocar
Mohsen Ghafouri
Siqi Bu
Ziqing Zhu
Towards an Interpretable Machine Learning Model for Predicting Antimicrobial Resistance
Mohamed Mediouni
Abdoulaye Banire Diallo
Zero-Shot Anomaly Detection with Dual-Branch Prompt Learning
S Ebrahimi Kahou
Zero-shot anomaly detection (ZSAD) enables identifying and localizing defects in unseen categories by relying solely on generalizable featur… (voir plus)es rather than requiring any labeled examples of anomalies. However, existing ZSAD methods, whether using fixed or learned prompts, struggle under domain shifts because their training data are derived from limited training domains and fail to generalize to new distributions. In this paper, we introduce PILOT, a framework designed to overcome these challenges through two key innovations: (1) a novel dual-branch prompt learning mechanism that dynamically integrates a pool of learnable prompts with structured semantic attributes, enabling the model to adaptively weight the most relevant anomaly cues for each input image; and (2) a label-free test-time adaptation strategy that updates the learnable prompt parameters using high-confidence pseudo-labels from unlabeled test data. Extensive experiments on 13 industrial and medical benchmarks demonstrate that PILOT achieves state-of-the-art performance in both anomaly detection and localization under domain shift.