Développez des compétences fondamentales en intelligence artificielle (IA) responsable grâce à des cours autodirigés, animés par des expert·e·s de Mila reconnu·e·s à l’échelle internationale.
Le Fellowship Mila en politiques de l'IA transforme l'expertise approfondie en IA en politiques rigoureuses d'intérêt public. Découvrez la dernière publication Combler la disparité en matière d’expertise : mécanismes de transfert des connaissances pour la réglementation de l’IA par Moritz von Knebel.
Ce programme soutient les startups spécialisées en IA à tout moment de l'année. Bénéficiez de ressources de pointe et d'un accompagnement sur mesure pour accélérer le développement de votre technologie.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
AI Agent Safety is a Reinforcement Learning Problem
With the rapid advancement and deployment of Agentic AI, our scientific understanding of capabilities and limitations has not kept pace, lea… (voir plus)ding to cases where AI agents cause harm. We argue that many of these safety limitations are not novel problems. Instead, the safety challenges currently facing AI agents can be seen as instances of problems the reinforcement learning (RL) community has studied rigorously for decades. The core of this argument concerns the problem formulation of AI agents. AI agents are designed to solve sequential decision-making problems: problems with long-term objectives in which actions have delayed consequences. To model these types of problem, the problem is set up the problem such that the agent receives observations, feedback on its progress, and then takes actions. This is precisely the formulation of the RL problem. In this paper, we formalize the problem equivalence, which we then leverage to argue that \textbf{AI Agent safety is a reinforcement learning problem: the failure modes currently observed in deployed AI agents are structural instances of problems RL has formalized for decades, and the RL safety literature provides principled tools to diagnose and address them.}. We conclude with a call for deliberate collaboration between the RL and AI agent research communities: AI agent researchers gain access to principled frameworks, while RL researchers gain a class of real-world problems that could expose fundamental gaps in current RL benchmarks and theory.
2026-05-22
AIWILD @ International Conference on Machine Learning (publié)
Consistent Identification of Top-$K$ Nodes in Noisy Networks
Hui Shen
Eric D. Kolaczyk
Identifying the most influential nodes in a network, typically using centrality measures, is a central task in applied network analysis. How… (voir plus)ever, real-world networks are often constructed from noisy or incomplete data, which can distort rankings and lead to errors in identifying the true top-
Real-world tasks require decisions at varying granularities, and humans excel at this by leveraging a unified cognitive representation where… (voir plus) planning is fundamentally understood as a high-level form of action. However, current Large Language Model (LLM)-based agents lack this crucial capability to operate fluidly across decision granularities. This limitation stems from existing paradigms that enforce a rigid separation between high-level planning and low-level action, which impairs dynamic adaptability and limits generalization. We propose **ReCode** (**Re**cursive **Code** Generation), a novel paradigm that addresses this limitation by unifying planning and action within a single code representation. In this representation, ReCode treats high-level plans as abstract placeholder functions, which the agent then recursively decomposes into finer-grained sub-functions until reaching primitive actions. This recursive approach dissolves the rigid boundary between plan and action, enabling the agent to dynamically control its decision granularity. Furthermore, the recursive structure inherently generates rich, multi-granularity training data, enabling models to learn hierarchical decision-making processes. Extensive experiments show ReCode significantly surpasses advanced baselines in inference performance and demonstrates exceptional data efficiency in training, validating our core insight that unifying planning and action through recursive code generation is a powerful and effective approach to achieving universal granularity control.
2026-05-22
AIWILD @ International Conference on Machine Learning (publié)
Web agents powered by large language and vision-language models are increasingly applied to realistic browser work that spans heterogeneous … (voir plus)applications, multimodal content, and stateful workflows. However, existing reproducible web-agent benchmarks cover only a small number of web applications drawn from a few software categories, and restrict modality to text and vision. Live benchmarks broaden site coverage but sacrifice reproducibility, since pages and data drift between runs. Moreover, existing benchmarks do not meaningfully evaluate whether agents can understand and use audio and video content embedded within web tasks. To address these gaps, we introduce WebArena-Pro, a benchmark comprising 300 tasks across 20 self-hosted web applications in six domain categories, spanning distinct interface conventions, workflows, and data models. Across the evaluated agents, the best performance is achieved by Gemini 3.1 Pro, which attains 37.0 % success under a 50-step budget, while open-source models' performance does not exceed 27.7% success. Among reproducible, human-curated web agent benchmarks, WebArena-Pro provides the broadest application coverage and the most comprehensive multimodal support to date. The benchmark treats audio and video as core observations alongside text and vision, with dedicated actions for extracting information from each. WebArena-Pro runs each task in isolation and supports reproducible, parallel evaluation. Tasks are authored through a dedicated annotator interface, filtered by LLM-assisted triage, and finally validated by humans before release.
2026-05-22
AIWILD @ International Conference on Machine Learning (publié)
Learned indexes have emerged as a promising alternative to traditional index structures, offering higher throughput and lower memory usage b… (voir plus)y approximating the cumulative key distribution function with lightweight models. Despite these benefits, adoption in production systems remains limited, partly because learned indexes that support concurrency and persistence as effectively as, e.g., the B+-Tree, do not yet exist, while many research prototypes introduce substantial complexity. In this paper, we investigate whether off-the-shelf learned indexes can be integrated into a production database with minimal storage-engine redesign. Using RocksDB as a case study, we exploit its separation between in-memory Memtables and immutable on-disk files to deploy specialized indexes at each level. We show that directly applying existing learned indexes is insufficient under write-heavy workloads because frequent Memtable replacement prevents models from fully adapting. To address this, we introduce a reuse mechanism that preserves structural knowledge across Memtable instances. At the storage level, we replace RocksDB's disk index with a learned index without modifying the storage layer or read path. We further adapt a read-only learned index to be block-aware, enabling worst-case single-I/O lookups. We implement these techniques in MountDB, an extension of RocksDB. Experiments on large-scale workloads with diverse data distributions and access patterns show up to 1.5X higher write throughput and 2.1X higher read throughput than state-of-the-art systems, demonstrating that established learned indexes can be integrated into production systems with minimal overhead and substantial performance benefits.
Historically, Alzheimer's disease (AD) and Parkinson's disease (PD) have been investigated as two distinct disorders of the brain. However, … (voir plus)a few similarities in neuropathology and clinical symptoms have been documented over the years. Traditional single-gene centric studies, such as differential gene expression analyses, have struggled to unravel the molecular basis for the observed pathological links between AD and PD. To address this, we tailor a latent factor framework to analyze synchronous gene co-expression at sub-cell-type resolution. Utilizing large, single-nucleus transcriptomics datasets in AD (70,634 nuclei) and PD (340,902 nuclei) from postmortem human brains, we systematically extract and juxtapose disease-critical molecular signatures in the brain. Our transcriptomic analysis reveals shared molecular programs between AD and PD that systematically localize to specific glial and neuronal cell types. In neurons, convergent gene groups in AD and PD relate to cytoskeletal dynamics and mitochondrial stress mechanisms. Similarly, overlapping gene groups in microglia modules implicate T cell activation mechanisms and synapse pruning pathways. In parallel, AD- and PD-associated genes in astrocytes are involved in heavy metal processing; oligodendrocytes highlight convergent dysregulation in myelin synthesis. In addition, our analysis reveals APOE, an AD GWAS gene, has disease predictive roles in PD-associated gene modules. Conversely, SNCA, a PD GWAS gene, emerges within AD associated gene modules. Our multi-module sub-cell-type approach offers unique insights into the molecular basis of shared neuropathology in AD and PD.
External validation of cough-based algorithms for pulmonary tuberculosis screening from the CODA TB DREAM challenge using cough data from Peru
Alexandra J. Zimmer
Patricia Espinoza-Lopez
Vijay Ravi
Solveig K. Sieberts
Samira Abbasgholizadeh Rahimi
Madhukar Pai
César Ugarte-Gil
Simon Grandjean Lapierre
The COugh Diagnostic Algorithm for Tuberculosis (CODA TB) DREAM Challenge recently evaluated the performance of artificial intelligence (AI)… (voir plus) algorithms for tuberculosis (TB) screening using cough sounds. Eleven AI models were developed using a dataset of 733,756 cough sounds collected from 2143 adults from seven countries. This study evaluates the CODA Challenge AI models with an external independent cough dataset from Peru. Cough recordings from 303 coughing adults were collected from health facilities in Lima, Peru. The AUCs of the models ranged from 0.480 to 0.615, showing a decrease in performance compared to their performance when internally validated using the CODA Challenge, which ranged from 0.689 to 0.743. The best performing model in the CODA Challenge was also the best performing model in this external validation. Sub-group analyses revealed that models performed better in older (≥ 35 years) populations and among people with prior TB. The external validation revealed limitations in the generalizability of the CODA Challenge models to other settings. While some models showed promise, the overall performance decline highlights the need for continued model validation on external datasets. It also underscores the importance of developing context-specific models to account for population-specific factors that influence cough characteristics and TB prevalence.
Model stealing attacks, where adversaries create high-fidelity surrogate models, are a significant threat to the intellectual property of ma… (voir plus)chine learning services. Conventional wisdom suggests these surrogates could provide adversaries with economic leverage comparable to the original service providers. This paper challenges this assumption by evaluating model stealing attacks beyond mere fidelity to the target model. Because query-based extraction provides only partial supervision of the target's input-output behavior, the surrogate is not uniquely identified: many near-optimal surrogates can achieve comparable fidelity while differing in deployment-relevant properties. Instead of performing a classic learning-based model stealing attack, we compute the Rashomon Set (i.e., the set of almost-equally-accurate models) of surrogate models, and evaluate its diversity using multiplicity metrics (ambiguity, discrepancy and rashomon capcity) and group fairness metrics. Our experiments on real-world datasets reveal that despite exhibiting similar fidelity to the target model, surrogate models can display significant variances in other critical performance metrics. These findings cast doubt on the presumed equivalence between high-fidelity surrogates and the target model in practical deployment scenarios.
2026-05-19
DMP @ Canadian Conference on Artificial Intelligence and Conference on Robots and Vision (présentation orale)
Humans can effortlessly describe what they see, yet establishing a shared representational format between vision and language remains a sign… (voir plus)ificant challenge. Emerging evidence suggests that human brain representations in both vision and language are well predicted by semantic feature spaces obtained from large language models (LLMs). This raises the possibility that sensory systems converge in their inherent ability to transform their inputs onto shared, embedding-like representational space. However, it remains unclear how such a space manifests in human behavior. To investigate this, 63 participants performed behavioral similarity judgments separately on 100 natural scene images and 100 corresponding sentence captions from the Natural Scenes Dataset. We found that visual and linguistic similarity judgments not only converge at the behavioral level but also predict a remarkably similar network of functional magnetic resonance imaging brain responses evoked by viewing the natural scene images. Furthermore, computational models trained to map images onto LLM-embeddings outperformed both category-trained and AlexNet controls in predicting the behavioral similarity structure. These findings demonstrate that human visual and linguistic similarity judgments are grounded in a shared, modality-agnostic representational structure that mirrors how the visual system encodes experience. The convergence between sensory and artificial systems observed here suggests a common capacity of how conceptual representations are formed-not as arbitrary products of first order, modality-specific input, but as structured representations that reflect the stable, relational properties of the external world.