TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la prochaine cohorte qui débutera le 15 avril.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Switching between tasks can cause AI to lose the ability to learn
In model-based reinforcement learning, an agent can leverage a learned model to improve its way of behaving in different ways. Two prevalent… (voir plus) approaches are decision-time planning and background planning. In this study, we are interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other in domains that require fast responses. After viewing them through the lens of dynamic programming, we first consider the classical instantiations of these planning styles and provide theoretical results and hypotheses on which one will perform better in the pure planning, planning&learning, and transfer learning settings. We then consider the modern instantiations of these planning styles and provide hypotheses on which one will perform better in the last two of the considered settings. Lastly, we perform several illustrative experiments to empirically validate both our theoretical results and hypotheses. Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations, it can perform on par or better than background planning in both the planning&learning and transfer learning settings.
Strong Gravitational Lensing as a Probe of Dark Matter
S. Vegetti
S. Birrer
G. Despali
C.D. Fassnacht
D. Gilman
Y. Hezaveh
L.
L. Perreault Levasseur
J.P. McKean
D.M. Powell
C.M. O'Riordan
G.
G. Vernardos
Dark matter structures within strong gravitational lens galaxies and along their line of sight leave a gravitational imprint on the multiple… (voir plus) images of lensed sources. Strong gravitational lensing provides, therefore, a key test of different dark matter models in a way that is independent of the baryonic content of matter structures on subgalactic scales. In this chapter, we describe how galaxy-scale strong gravitational lensing observations are sensitive to the physical nature of dark matter. We provide a historical perspective of the field, and review its current status. We discuss the challenges and advances in terms of data, treatment of systematic errors and theoretical predictions, that will enable one to deliver a stringent and robust test of different dark matter models in the near future. With the advent of the next generation of sky surveys, the number of known strong gravitational lens systems is expected to increase by several orders of magnitude. Coupled with high-resolution follow-up observations, these data will provide a key opportunity to constrain the properties of dark matter with strong gravitational lensing.
Single-cell multi-omics data reveal complex cellular states, providing significant insights into cellular dynamics and disease. Yet, integra… (voir plus)tion of multi-omics data presents challenges. Some modalities have not reached the robustness or clarity of established transcriptomics. Coupled with data scarcity for less established modalities and integration intricacies, these challenges limit our ability to maximize single-cell omics benefits. We introduce scCross, a tool leveraging variational autoencoders, generative adversarial networks, and the mutual nearest neighbors (MNN) technique for modality alignment. By enabling single-cell cross-modal data generation, multi-omics data simulation, and in silico cellular perturbations, scCross enhances the utility of single-cell multi-omics studies.
The online version contains supplementary material available at 10.1186/s13059-024-03338-z.
Incident reporting and learning systems provide an opportunity to identify systemic vulnerabilities that contribute to incidents and potenti… (voir plus)ally degrade quality. The narrative of an incident is intended to provide a clear, easy to understand description of an incident. Unclear, incomplete or poorly organized narratives compromise the ability to learn from them. This report provides guidance for drafting effective narratives, with particular attention to the use of narratives in incident reporting and learning systems (IRLS). Examples are given that compare effective and less than effective narratives. This report is mostly directed to organizations that maintain IRLS, but also may be helpful for individuals who desire to write a useful narrative for entry into such a system. Recommendations include the following: (1) Systems should allow a one- or two-sentence, free-text synopsis of an incident without guessing at causes; (2) Information included should form a sequence of events with chronology; and (3) Reporting and learning systems should consider using the headings suggested to guide the reporter through the narrative: (a) incident occurrences and actions by role; (b) prior circumstances and actions; (c) method by which the incident was identified; (d) equipment related details if relevant; (e) recovery actions by role; (f) relevant time span between responses; (g) and how individuals affected during or immediately after incident. When possible and appropriate, supplementary information including relevant data elements should be included using numerical scales or drop-down choices outside of the narrative. Information that should not be included in the narrative includes: (a) patient health information (PHI); (b) conjecture or blame; (c) jargon abbreviations or details without specifying their significance; (d) causal analysis.
Offline reinforcement learning has shown promise for solving tasks in safety-critical settings, such as clinical decision support. Its appli… (voir plus)cation, however, has been limited by the lack of interpretability and interactivity for clinicians. To address these challenges, we propose the medical decision transformer (MeDT), a novel and versatile framework based on the goal-conditioned reinforcement learning paradigm for sepsis treatment recommendation. MeDT uses the decision transformer architecture to learn a policy for drug dosage recommendation. During offline training, MeDT utilizes collected treatment trajectories to predict administered treatments for each time step, incorporating known treatment outcomes, target acuity scores, past treatment decisions, and current and past medical states. This analysis enables MeDT to capture complex dependencies among a patient's medical history, treatment decisions, outcomes, and short-term effects on stability. Our proposed conditioning uses acuity scores to address sparse reward issues and to facilitate clinician-model interactions, enhancing decision-making. Following training, MeDT can generate tailored treatment recommendations by conditioning on the desired positive outcome (survival) and user-specified short-term stability improvements. We carry out rigorous experiments on data from the MIMIC-III dataset and use off-policy evaluation to demonstrate that MeDT recommends interventions that outperform or are competitive with existing offline reinforcement learning methods while enabling a more interpretable, personalized and clinician-directed approach.
Reinforcement learning practitioners often avoid hierarchical policies, especially in image-based observation spaces. Typically, the single-… (voir plus)task performance improvement over flat-policy counterparts does not justify the additional complexity associated with implementing a hierarchy. However, by introducing multiple decision-making levels, hierarchical policies can compose lower-level policies to more effectively generalize between tasks, highlighting the need for multi-task evaluations. We analyze the benefits of hierarchy through simulated multi-task robotic control experiments from pixels. Our results show that hierarchical policies trained with task conditioning can (1) increase performance on training tasks, (2) lead to improved reward and state-space generalizations in similar tasks, and (3) decrease the complexity of fine tuning required to solve novel tasks. Thus, we believe that hierarchical policies should be considered when building reinforcement learning architectures capable of generalizing between tasks.
Large language models (LLMs) have been increasingly applied to tasks in language understanding and interactive decision-making, with their i… (voir plus)mpressive performance largely attributed to the extensive domain knowledge embedded within them. However, the depth and breadth of this knowledge can vary across domains. Many existing approaches assume that LLMs possess a comprehensive understanding of their environment, often overlooking potential gaps in their grasp of actual world dynamics. To address this, we introduce Discover, Verify, and Evolve (DiVE), a framework that discovers world dynamics from a small number of demonstrations, verifies the accuracy of these dynamics, and evolves new, advanced dynamics tailored to the current situation. Through extensive evaluations, we assess the impact of each component on performance and compare the dynamics generated by DiVE to human-annotated dynamics. Our results show that LLMs guided by DiVE make more informed decisions, achieving rewards comparable to human players in the Crafter environment and surpassing methods that require prior task-specific training in the MiniHack environment.