TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la prochaine cohorte qui débutera le 15 avril.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Virtual cell modeling represents an emerging frontier at the intersection of artificial intelligence and biology, aiming to predict quantiti… (voir plus)es such as responses to diverse perturbations quantitatively. However, autonomously building computational models for virtual cells is challenging due to the complexity of biological systems, the heterogeneity of data modalities, and the need for domain-specific expertise across multiple disciplines. Here, we introduce CellForge, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. More specifically, given only raw single-cell multi-omics data and task descriptions as input, CellForge outputs both an optimized model architecture and executable code for training virtual cell models and inference. The framework integrates three core modules: Task Analysis for presented dataset characterization and relevant literature retrieval, Method Design, where specialized agents collaboratively develop optimized modeling strategies, and Experiment Execution for automated generation of code. The agents in the Design module are separated into experts with differing perspectives and a central moderator, and have to collaboratively exchange solutions until they achieve a reasonable consensus. We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets that encompass gene knockouts, drug treatments, and cytokine stimulations across multiple modalities. CellForge consistently outperforms task-specific state-of-the-art methods. Overall, CellForge demonstrates how iterative interaction between LLM agents with differing perspectives provides better solutions than directly addressing a modeling challenge. Our code is publicly available at https://github.com/gersteinlab/CellForge.
Many real-world applications require recognition models that are robust to different operational conditions and modalities, but at the same … (voir plus)time run on small embedded devices, with limited hardware. While for normal size models, pre-training is known to be very beneficial in accuracy and robustness, for small models, that can be employed for embedded and edge devices, its effect is not clear. In this work, we investigate the effect of ImageNet pretraining on increasingly small backbone architectures (ultra-small models, with
This special day event on Responsible Artificial Intelligence (AI) brings together researchers, practitioners, and policymakers to explore h… (voir plus)ow data mining and machine learning systems can be designed to align with ethical principles, societal values, and human well-being. As AI technologies increasingly influence decisions in healthcare, finance, governance, and social systems, there is a critical need to develop frameworks that embed fairness, accountability, and privacy directly into the foundations of knowledge discovery. This full-day event will feature a mix of invited talks, interactive debates, expert panels, and peer-reviewed research presentations, all focused on the practical integration of ethical design into data-driven systems. The Responsible AI Day builds on the success of Canada's NSERC CREATE Program on Responsible AI, an interdisciplinary initiative training the next generation of AI researchers across computer science, law, bioethics, public health, and media studies. Topics will span scalable AI governance, privacy-preserving computation, algorithmic bias mitigation, and the socio-legal tensions emerging in generative AI. By positioning responsible AI as a sociotechnical challenge, this special day aligns with KDD's mission of advancing data science that is not only technically robust but also socially conscious.
2025-08-02
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (publié)
One-Shot Federated Learning (OSFL) restricts communication between the server and clients to a single round, significantly reducing communic… (voir plus)ation costs and minimizing privacy leakage risks compared to traditional Federated Learning (FL), which requires multiple rounds of communication. However, existing OSFL frameworks remain vulnerable to distributional heterogeneity, as they primarily focus on model heterogeneity while neglecting data heterogeneity. To bridge this gap, we propose FedHydra, a unified, data-free, OSFL framework designed to effectively address both model and data heterogeneity. Unlike existing OSFL approaches, FedHydra introduces a novel two-stage learning mechanism. Specifically, it incorporates model stratification and heterogeneity-aware stratified aggregation to mitigate the challenges posed by both model and data heterogeneity. By this design, the data and model heterogeneity issues are simultaneously monitored from different aspects during learning. Consequently, FedHydra can effectively mitigate both issues by minimizing their inherent conflicts. We compared FedHydra with five SOTA baselines on four benchmark datasets. Experimental results show that our method outperforms the previous OSFL methods in both homogeneous and heterogeneous settings. The code is available at https://github.com/Jun-B0518/FedHydra.
2025-08-02
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (publié)
Idiopathic pulmonary fibrosis (IPF) is a progressive and lethal disease characterized by excessive extracellular matrix deposition. Current … (voir plus)IPF therapies slow disease progression but do not stop or reverse it. The (myo)fibroblasts are thought to be the main cellular contributors to excessive extracellular matrix production in IPF. Here we show that fibrotic alveolar type II cells regulate production and crosslinking of extracellular matrix via the co-transcriptional activator YAP. YAP leads to increased expression of Lysl oxidase (LOX) and subsequent LOX-mediated crosslinking by fibrotic alveolar type II cells. Pharmacological YAP inhibition via verteporfin reverses fibrotic alveolar type II cell reprogramming and LOX expression in experimental lung fibrosis in vivo and in human fibrotic tissue ex vivo. We thus identify YAP-TEAD/LOX inhibition in alveolar type II cells as a promising potential therapy for IPF patients.
Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution re… (voir plus)mains poorly understood. While developers often make intuitive assumptions about which types of modifications are likely to cause performance regressions or improvements, these beliefs lack empirical validation at a fine-grained level. We conducted a large-scale empirical study analyzing performance evolution in 15 mature open-source Java projects hosted on GitHub. Our analysis encompassed 739 commits containing 1,499 method-level code changes, using Java Microbenchmark Harness (JMH) for precise performance measurement and rigorous statistical analysis to quantify both the significance and magnitude of performance variations. We employed bytecode instrumentation to capture method-specific execution metrics and systematically analyzed four key aspects: temporal performance patterns, code change type correlations, developer and complexity factors, and domain-size interactions. Our findings reveal that 32.7% of method-level changes result in measurable performance impacts, with regressions occurring 1.3 times more frequently than improvements. Contrary to conventional wisdom, we found no significant differences in performance impact distributions across code change categories, challenging risk-stratified development strategies. Algorithmic changes demonstrate the highest improvement potential but carry substantial regression risk. Senior developers produce more stable changes with fewer extreme variations, while code complexity correlates with increased regression likelihood. Domain-size interactions reveal significant patterns, with web server + small projects exhibiting the highest performance instability. Our study provides empirical evidence for integrating automated performance testing into continuous integration pipelines.
Facial expression recognition (FER) models are employed in many video-based affective computing applications, such as human-computer interac… (voir plus)tion and healthcare monitoring. However, deep FER models often struggle with subtle expressions and high inter-subject variability, limiting their performance in real-world applications. To improve their performance, source-free domain adaptation (SFDA) methods have been proposed to personalize a pretrained source model using only unlabeled target domain data, thereby avoiding data privacy, storage, and transmission constraints. This paper addresses a challenging scenario where source data is unavailable for adaptation, and only unlabeled target data consisting solely of neutral expressions is available. SFDA methods are not typically designed to adapt using target data from only a single class. Further, using models to generate facial images with non-neutral expressions can be unstable and computationally intensive. In this paper, personalized feature translation (PFT) is proposed for SFDA. Unlike current image translation methods for SFDA, our lightweight method operates in the latent space. We first pre-train the translator on the source domain data to transform the subject-specific style features from one source subject into another. Expression information is preserved by optimizing a combination of expression consistency and style-aware objectives. Then, the translator is adapted on neutral target data, without using source data or image synthesis. By translating in the latent space, PFT avoids the complexity and noise of face expression generation, producing discriminative embeddings optimized for classification. Using PFT eliminates the need for image synthesis, reduces computational overhead (using a lightweight translator), and only adapts part of the model, making the method efficient compared to image-based translation.