Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
GAPS Phase III: incorporation of capacity based weighting in the global assessment for pediatric surgery
"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This ar… (voir plus)ticle will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop a deep learning tool for the automatic segmentation of the spinal cord and intramedullary lesions in spinal cord injury (SCI) on T2-weighted MRI scans. Materials and Methods This retrospective study included MRI data acquired between July 2002 and February 2023 from 191 patients with SCI (mean age, 48.1 years ± 17.9 [SD]; 142 males). The data consisted of T2-weighted MRI acquired using different scanner manufacturers with various image resolutions (isotropic and anisotropic) and orientations (axial and sagittal). Patients had different lesion etiologies (traumatic, ischemic, and hemorrhagic) and lesion locations across the cervical, thoracic and lumbar spine. A deep learning model, SCIseg, was trained in a three-phase process involving active learning for the automatic segmentation of intramedullary SCI lesions and the spinal cord. The segmentations from the proposed model were visually and quantitatively compared with those from three other open-source methods (PropSeg, DeepSeg and contrast-agnostic, all part of the Spinal Cord Toolbox). Wilcoxon signed-rank test was used to compare quantitative MRI biomarkers of SCI (lesion volume, lesion length, and maximal axial damage ratio) derived from the manual reference standard lesion masks and biomarkers obtained automatically with SCIseg segmentations. Results SCIseg achieved a Dice score of 0.92 ± 0.07 (mean ± SD) and 0.61 ± 0.27 for spinal cord and SCI lesion segmentation, respectively. There was no evidence of a difference between lesion length (P = .42) and maximal axial damage ratio (P = .16) computed from manually annotated lesions and the lesion segmentations obtained using SCIseg. Conclusion SCIseg accurately segmented intramedullary lesions on a diverse dataset of T2-weighted MRI scans and extracted relevant lesion biomarkers (namely, lesion volume, lesion length, and maximal axial damage ratio). SCIseg is open-source and accessible through the Spinal Cord Toolbox (v6.2 and above). Published under a CC BY 4.0 license.
Text-to-SQL enables users to interact with databases through natural language, simplifying access to structured data. Although highly capabl… (voir plus)e large language models (LLMs) achieve strong accuracy for complex queries, they incur unnecessary latency and dollar cost for simpler ones. In this paper, we introduce the first LLM routing approach for Text-to-SQL, which dynamically selects the most cost-effective LLM capable of generating accurate SQL for each query. We present two routing strategies (score- and classification-based) that achieve accuracy comparable to the most capable LLM while reducing costs. We design the routers for ease of training and efficient inference. In our experiments, we highlight a practical and explainable accuracy-cost trade-off on the BIRD dataset.
Predicting molecular impact on cellular function is a core challenge in therapeutic design.
Phenomic experiments, designed to capture cellu… (voir plus)lar morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell. In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning. Specifically, we study the problem of Contrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments. We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration. We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration. Following this recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds. In particular, we demonstrate an 8.1
Code auditing ensures that the developed code adheres to standards, regulations, and copyright protection by verifying that it does not cont… (voir plus)ain code from protected sources. The recent advent of Large Language Models (LLMs) as coding assistants in the software development process poses new challenges for code auditing. The dataset for training these models is mainly collected from publicly available sources. This raises the issue of intellectual property infringement as developers' codes are already included in the dataset. Therefore, auditing code developed using LLMs is challenging, as it is difficult to reliably assert if an LLM used during development has been trained on specific copyrighted codes, given that we do not have access to the training datasets of these models. Given the non-disclosure of the training datasets, traditional approaches such as code clone detection are insufficient for asserting copyright infringement. To address this challenge, we propose a new approach, TraWiC; a model-agnostic and interpretable method based on membership inference for detecting code inclusion in an LLM's training dataset. We extract syntactic and semantic identifiers unique to each program to train a classifier for detecting code inclusion. In our experiments, we observe that TraWiC is capable of detecting 83.87% of codes that were used to train an LLM. In comparison, the prevalent clone detection tool NiCad is only capable of detecting 47.64%. In addition to its remarkable performance, TraWiC has low resource overhead in contrast to pair-wise clone detection that is conducted during the auditing process of tools like CodeWhisperer reference tracker, across thousands of code snippets.
2024-11-02
ACM Transactions on Software Engineering and Methodology (publié)
The CA1 region of the hippocampus is one of the most studied regions of the rodent brain, thought to play an important role in cognitive fun… (voir plus)ctions such as memory and spatial navigation. Despite a wealth of experimental data on its structure and function, it has been challenging to integrate information obtained from diverse experimental approaches. To address this challenge, we present a community-based, full-scale in silico model of the rat CA1 that integrates a broad range of experimental data, from synapse to network, including the reconstruction of its principal afferents, the Schaffer collaterals, and a model of the effects that acetylcholine has on the system. We tested and validated each model component and the final network model, and made input data, assumptions, and strategies explicit and transparent. The unique flexibility of the model allows scientists to potentially address a range of scientific questions. In this article, we describe the methods used to set up simulations to reproduce in vitro and in vivo experiments. Among several applications in the article, we focus on theta rhythm, a prominent hippocampal oscillation associated with various behavioral correlates and use our computer model to reproduce experimental findings. Finally, we make data, code, and model available through the hippocampushub.eu portal, which also provides an extensive set of analyses of the model and a user-friendly interface to facilitate adoption and usage. This community-based model represents a valuable tool for integrating diverse experimental data and provides a foundation for further research into the complex workings of the hippocampal CA1 region.