TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la prochaine cohorte qui débutera le 15 avril.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Flaky Performances when Pretraining on Relational Databases
Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. Howev… (voir plus)er, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.
Inferring accurate posteriors for high-dimensional representations of the brightness of gravitationally-lensed sources is a major challenge,… (voir plus) in part due to the difficulties of accurately quantifying the priors. Here, we report the use of a score-based model to encode the prior for the inference of undistorted images of background galaxies. This model is trained on a set of high-resolution images of undistorted galaxies. By adding the likelihood score to the prior score and using a reverse-time stochastic differential equation solver, we obtain samples from the posterior. Our method produces independent posterior samples and models the data almost down to the noise level. We show how the balance between the likelihood and the prior meet our expectations in an experiment with out-of-distribution data.
Bayesian causal structure learning aims to learn a posterior distribution over directed acyclic graphs (DAGs), and the mechanisms that defin… (voir plus)e the relationship between parent and child variables. By taking a Bayesian approach, it is possible to reason about the uncertainty of the causal model. The notion of modelling the uncertainty over models is particularly crucial for causal structure learning since the model could be unidentifiable when given only a finite amount of observational data. In this paper, we introduce a novel method to jointly learn the structure and mechanisms of the causal model using Variational Bayes, which we call Variational Bayes-DAG-GFlowNet (VBG). We extend the method of Bayesian causal structure learning using GFlowNets to learn not only the posterior distribution over the structure, but also the parameters of a linear-Gaussian model. Our results on simulated data suggest that VBG is competitive against several baselines in modelling the posterior over DAGs and mechanisms, while offering several advantages over existing methods, including the guarantee to sample acyclic graphs, and the flexibility to generalize to non-linear causal mechanisms.
New neurons are continuously generated in the subgranular zone of the dentate gyrus throughout adulthood. These new neurons gradually integr… (voir plus)ate into hippocampal circuits, forming new naive synapses. Viewed from this perspective, these new neurons may represent a significant source of “wiring” noise in hippocampal networks. In machine learning, such noise injection is commonly used as a regularization technique. Regularization techniques help prevent overfitting training data and allow models to generalize learning to new, unseen data. Using a computational modeling approach, here we ask whether a neurogenesis-like process similarly acts as a regularizer, facilitating generalization in a category learning task. In a convolutional neural network (CNN) trained on the CIFAR-10 object recognition dataset, we modeled neurogenesis as a replacement/turnover mechanism, where weights for a randomly chosen small subset of hidden layer neurons were reinitialized to new values as the model learned to categorize 10 different classes of objects. We found that neurogenesis enhanced generalization on unseen test data compared to networks with no neurogenesis. Moreover, neurogenic networks either outperformed or performed similarly to networks with conventional noise injection (i.e., dropout, weight decay, and neural noise). These results suggest that neurogenesis can enhance generalization in hippocampal learning through noise injection, expanding on the roles that neurogenesis may have in cognition.
2022-11-01
Proceedings of the National Academy of Sciences of the United States of America (publié)
Spinal cord cross-sectional area (CSA) is a relevant biomarker to assess spinal cord atrophy in neurodegenerative diseases. However, the con… (voir plus)siderable inter-subject variability among healthy participants currently limits its usage. Previous studies explored factors contributing to the variability, yet the normalization models required manual intervention and used vertebral levels as a reference, which is an imprecise prediction of the spinal levels. In this study we implemented a method to measure CSA automatically from a spatial reference based on the central nervous system (the pontomedullary junction, PMJ), we investigated factors to explain variability, and developed normalization strategies on a large cohort (N = 804). Following automatic spinal cord segmentation, vertebral labeling and PMJ labeling, the spinal cord CSA was computed on T1w MRI scans from the UK Biobank database. The CSA was computed using two methods. For the first method, the CSA was computed at the level of the C2–C3 intervertebral disc. For the second method, the CSA was computed at 64 mm caudally from the PMJ, this distance corresponding to the average distance between the PMJ and the C2–C3 disc across all participants. The effect of various demographic and anatomical factors was explored, and a stepwise regression found significant predictors; the coefficients of the best fit model were used to normalize CSA. CSA measured at C2–C3 disc and using the PMJ differed significantly (paired t-test, p-value = 0.0002). The best normalization model included thalamus, brain volume, sex and the interaction between brain volume and sex. The coefficient of variation went down for PMJ CSA from 10.09 (without normalization) to 8.59%, a reduction of 14.85%. For CSA at C2–C3, it went down from 9.96 to 8.42%, a reduction of 15.13 %. This study introduces an end-to-end automatic pipeline to measure and normalize cord CSA from a neurological reference. This approach requires further validation to assess atrophy in longitudinal studies. The inter-subject variability of CSA can be partly accounted for by demographics and anatomical factors.
How can we study social interactions on evolving topics at a mass scale? Over the past decade, researchers from diverse fields such as econo… (voir plus)mics, political science, and public health have often done this by querying Twitter's public API endpoints with hand-picked topical keywords to search or stream discussions. However, despite the API's accessibility, it remains difficult to select and update keywords to collect high-quality data relevant to topics of interest. In this paper, we propose an active learning method for rapidly refining query keywords to increase both the yielded topic relevance and dataset size. We leverage a large open-source COVID-19 Twitter dataset to illustrate the applicability of our method in tracking Tweets around the key sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method achieves an average topic-related keyword recall 2x higher than baselines. We open-source our code along with a web interface for keyword selection to make data collection from Twitter more systematic for researchers.
2022-10-31
2022 IEEE International Conference on Data Mining Workshops (ICDMW) (publié)