TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la séance d'information le 12 mars prochain pour en apprendre plus sur le programme.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization
Protein-protein interactions (PPIs) are mediated at the residue level. Most sequence-based PPI models consider residue-residue interactions … (voir plus)across two proteins, which can yield accurate interaction scores but are too slow to scale. At proteome scale, identifying candidate PPIs requires evaluating nearly *all possible protein pairs*. For
2025-12-31
International Conference on Learning Representations (Accept (Poster))
The segmentation clock is an emergent embryonic oscillator that controls the periodic formation of vertebrae precursors (or somites). It rel… (voir plus)ies on the self-organization at the presomitic mesoderm (PSM) level of multiple coupled cellular oscillators. Dissociation-reaggregation experiments have further revealed that ensembles made of such cellular oscillators self-organize into an oscillatory bidimensional system, showing concentric waves around multiple foci. Here, we systematically study the dynamics of a two-dimensional lattice of phase oscillators locally coupled to their nearest neighbors through a biharmonic coupling function of the form sinθ+Λsin^{2}θ. This coupling was inferred from the phase response curve of entrainment experiments on cell cultures, leading to the formulation of a minimal Elliptic Radial Isochron Cycle (ERIC) phase model. We show that such ERIC-based coupling parsimoniously explains the emergence of self-organized concentric phase wave patterns around multiple foci for a range of weak couplings and wide distributions of initial random phases, closely mimicking experimental conditions. We further study extended modalities of this problem to derive an atlas of possible behaviors. In particular, we predict the dominant observation of spirals over target wave patterns for initial phase distributions wider than approximately π. Since PSM cells further display properties of an excitable system, we also introduce excitability into our simple model and show that it also supports the observation of concentric phase waves for the conditions of the experiment. Our work suggests important modifications that can be made to the simple phase model with Kuramoto coupling, which can provide further layers of complexity and aid in the explanation of the spatial aspects of self-organization in the segmentation clock.
Most applications of generative AI involve a sequential interaction in which a person inputs a prompt and waits for a response, and where re… (voir plus)action time and adaptivity are not important factors. In contrast, live jamming is a collaborative interaction that requires real-time coordination and adaptation without access to the other player’s future moves, while preserving diversity to sustain a creative flow. Reinforcement learning post-training enables effective adaptation through on-policy interaction, yet it often reduces output diversity by exploiting coherence-based rewards. This collapse, known as ``reward hacking'', affects many RL post-training pipelines, but is especially harmful in live jamming, where musical creativity relies on dynamic variation and mutual responsiveness. In this paper, we propose a novel adversarial training method on policy-generated trajectories to mitigate reward hacking in RL post-training for melody-to-chord accompaniment. A co-evolving discriminator separates policy trajectories from the data distribution, while the policy maximizes the discriminator output in addition to coherence rewards to prevent collapse to trivial outputs. We evaluate accompaniment quality and output diversity in simulation with both fixed test melodies and learned melody agents, and we conduct a user study with the model deployed in a real-time interactive system with expert musicians. Quantitative evaluation and user feedback demonstrate improved output diversity, harmonic coherence, adaptation speed and user agency. Our results demonstrate a simple yet effective method to mitigate reward hacking in RL post-training of generative sequence models.
2025-12-31
International Conference on Learning Representations (Accept (Poster))
Building reliable computer-use agents requires grounding: accurately connecting natural language instructions to the correct on-screen eleme… (voir plus)nts. While large datasets exist for web and mobile interactions, high-quality resources for desktop environments are limited. To address this gap, we introduce GroundCUA, a large-scale desktop grounding dataset built from expert human demonstrations. It covers 87 applications across 12 categories and includes 56K screenshots, with every on-screen element carefully annotated for a total of over 3.56M human-verified annotations. From these demonstrations, we generate diverse instructions that capture a wide range of real-world tasks, providing high-quality data for model training. Using GroundCUA, we develop the GroundNext family of models that map instructions to their target UI elements. At both 3B and 7B scales, GroundNext achieves state-of-the-art results across five benchmarks using supervised fine-tuning, while requiring less than one-tenth the training data of prior work. Reinforcement learning post-training further improves performance. These results demonstrate the critical role of high-quality, expert-driven datasets in advancing general-purpose computer-use agents.
2025-12-31
International Conference on Learning Representations (Accept (Poster))
In-context reinforcement learning (ICRL) promises fast adaptation to unseen environments without parameter updates, but current methods eith… (voir plus)er cannot improve beyond the training distribution or require near-optimal data, limiting practical adoption. We introduce SPICE, a Bayesian ICRL method that learns a prior over Q-values via deep ensemble and updates this prior at test-time using in-context information through Bayesian updates. To recover from poor priors resulting from training on sub-optimal data, our online inference follows an Upper-Confidence Bound rule that favours exploration and adaptation. We prove that SPICE achieves regret-optimal behaviour in both stochastic bandits and finite-horizon MDPs, even when pretrained only on suboptimal trajectories. We validate these findings empirically across bandit and control benchmarks. SPICE achieves near-optimal decisions on unseen tasks, substantially reduces regret compared to prior ICRL and meta-RL approaches while rapidly adapting to unseen tasks and remaining robust under distribution shift.
Ensuring that deep learning models are well-calibrated in terms of their predictive uncertainty is essential in maintaining their trustworth… (voir plus)iness and reliability, yet despite increasing advances in foundation model research, the relationship between such large language models (LLMs) and their calibration remains an open area of research. In this work, we look at a critical gap in the calibration of LLMs within multilingual settings, in an attempt to better understand how the data scarcity can potentially lead to different calibration effects and how commonly used techniques can apply in these settings. Our analysis on two multilingual benchmarks, over 29 and 42 languages respectively, reveals that even in low-resource languages, model confidence can increase significantly after instruction-tuning on high-resource language SFT datasets. However, improvements in accuracy are marginal or non-existent, resulting in mis-calibration, highlighting a critical shortcoming of standard SFT for multilingual languages. Furthermore, we observe that the use of label smoothing to be a reasonable method alleviate this concern, again without any need for low-resource SFT data, maintaining better calibration across all languages. Overall, this highlights the importance of multilingual considerations for both training and tuning LLMs in order to improve their reliability and fairness in downstream use.
Do machine learning methods make better predictions than conventional ones in pharmacoepidemiology? A systematic review, meta-analysis, and network meta-analysis.