TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la prochaine cohorte qui débutera le 15 avril.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling
Single-cell Assay for Transposase-Accessible Chromatin with sequencing (scATAC-seq) has emerged as a powerful technique for investigating op… (voir plus)en chromatin landscapes at single-cell resolution. However, analyzing scATAC-seq data remain challenging due to its sparsity and noise. Genome Foundation Models (GFMs), pre-trained on massive DNA sequences, have proven effective at genome analysis. Given that open chromatin regions (OCRs) harbour salient sequence features, we hypothesize that leveraging GFMs’ sequence embeddings can improve the accuracy and generalizability of scATAC-seq modeling. Here, we introduce the Genome Foundation Embedded Topic Model (GFETM), an interpretable deep learning framework that combines GFMs with the Embedded Topic Model (ETM) for scATAC-seq data analysis. By integrating the DNA sequence embeddings extracted by a GFM from OCRs, GFETM demonstrates superior accuracy and generalizability and captures cell-state specific TF activity both with zero-shot inference and attention mechanism analysis. Finally, the topic mixtures inferred by GFETM reveal biologically meaningful epigenomic signatures of kidney diabetes.
Recent progress in self-supervised (SSL) visual representation learning has led to the development of several different proposed frameworks … (voir plus)that rely on augmentations of images but use different loss functions. However, there are few theoretically grounded principles to guide practice, so practical implementation of each SSL framework requires several heuristics to achieve competitive performance. In this work, we build on recent analytical results to design practical recommendations for competitive and efficient SSL that are grounded in theory. Specifically, recent theory tells us that existing SSL frameworks are minimizing the same idealized loss, which is to learn features that best match the data similarity kernel defined by the augmentations used. We show how this idealized loss can be reformulated to a functionally equivalent loss that is more efficient to compute. We study the implicit bias of using gradient descent to minimize our reformulated loss function and find that using a stronger orthogonalization constraint with a reduced projector dimensionality should yield good representations. Furthermore, the theory tells us that approximating the reformulated loss should be improved by increasing the number of augmentations, and as such using multiple augmentations should lead to improved convergence. We empirically verify our findings on CIFAR, STL and Imagenet datasets, wherein we demonstrate an improved linear readout performance when training a ResNet-backbone using our theoretically grounded recommendations. Remarkably, we also demonstrate that by leveraging these insights, we can reduce the pretraining dataset size by up to 2
Hint Marginalization for Improved Reasoning in Large Language Models
Soumyasundar Pal
Didier Chételat
Yingxue Zhang
Mark J. Coates
Large Language Models (LLMs) have exhibited an impressive capability to perform reasoning tasks, especially if they are encouraged to genera… (voir plus)te a sequence of intermediate steps. Reasoning performance can be improved by suitably combining multiple LLM responses, generated either in parallel in a single query, or via sequential interactions with LLMs throughout the reasoning process. Existing strategies for combination, such as self-consistency and progressive-hint-prompting, make inefficient usage of the LLM responses. We present Hint Marginalization, a novel and principled algorithmic framework to enhance the reasoning capabilities of LLMs. Our approach can be viewed as an iterative sampling strategy for forming a Monte Carlo approximation of an underlying distribution of answers, with the goal of identifying the mode the most likely answer. Empirical evaluation on several benchmark datasets for arithmetic reasoning demonstrates the superiority of the proposed approach.
The integration of diverse clinical modalities such as medical imaging and the tabular data extracted from patients' Electronic Health Recor… (voir plus)ds (EHRs) is a crucial aspect of modern healthcare. Integrative analysis of multiple sources can provide a comprehensive understanding of the clinical condition of a patient, improving diagnosis and treatment decision. Deep Neural Networks (DNNs) consistently demonstrate outstanding performance in a wide range of multimodal tasks in the medical domain. However, the complex endeavor of effectively merging medical imaging with clinical, demographic and genetic information represented as numerical tabular data remains a highly active and ongoing research pursuit.
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements. This approach aims to leverage the complementary information present in these modalities to enhance the accuracy of various medical applications. We demonstrate the strength and generality of our method on two different brain Magnetic Resonance Imaging (MRI) analysis tasks, namely, brain age prediction conditioned by subject's sex and multi-class Alzheimer's Disease (AD) classification conditioned by tabular data. We show that our framework outperforms both single-modality models and state-of-the-art MRI tabular data fusion methods. A link to our code can be found at https://github.com/daniel4725/HyperFusion
Decentralized Anonymous Credential (DAC) systems are increasingly relevant, especially when enhancing revocation mechanisms in the face of c… (voir plus)omplex traceability challenges. This paper introduces IDEA-DAC a paradigm shift from the conventional revoke-and-reissue methods, promoting direct and Integrity-Driven Editing (IDE) for Accountable DACs, which results in better integrity accountability, traceability, and system simplicity. We further incorporate an Edit-bound Conformity Check that ensures tailored integrity standards during credential amendments using R1CS-based ZK-SNARKs. Delving deeper, we propose ZK-JSON, a unique R1CS circuit design tailored for IDE over generic JSON documents. This design imposes strictly O(N) rank-1 constraints for variable-length JSON documents of up to N bytes in length, encompassing serialization, encryption, and edit-bound conformity checks. Additionally, our circuits only necessitate a one-time compilation, setup, and smart contract deployment for homogeneous JSON documents up to a specified size. While preserving core DAC features such as selective disclosure, anonymity, and predicate provability, IDEA-DAC achieves precise data modification checks without revealing private content, ensuring only authorized edits are permitted. In summary, IDEA-DAC offers an enhanced methodology for large-scale JSON-formatted credential systems, setting a new standard in decentralized identity management efficiency and precision.