Développez des compétences fondamentales en intelligence artificielle (IA) responsable grâce à des cours autodirigés, animés par des expert·e·s de Mila reconnu·e·s à l’échelle internationale.
Le Fellowship Mila en politiques de l'IA transforme l'expertise approfondie en IA en politiques rigoureuses d'intérêt public. Découvrez la dernière publication Combler la disparité en matière d’expertise : mécanismes de transfert des connaissances pour la réglementation de l’IA par Moritz von Knebel.
Ce programme soutient les startups spécialisées en IA à tout moment de l'année. Bénéficiez de ressources de pointe et d'un accompagnement sur mesure pour accélérer le développement de votre technologie.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization
Large Language Models (LLMs) excel at general-purpose tasks, yet adapting their responses to individual users remains challenging. Retrieval… (voir plus) augmentation provides a lightweight alternative to fine-tuning by conditioning LLMs on user history records, and existing approaches typically select these records based on semantic relevance. We argue that relevance serves as an unreliable proxy for utility: a record may be semantically similar to a query yet fail to improve generation quality or even degrade it due to redundancy or conflicting information. To bridge this gap, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for Llm pErsonalization. In contrast to a greedy selection of the most relevant records, PURPLE treats profile construction as a set generation process and utilizes a Plackett-Luce ranking model to capture complex inter-record dependencies. By training with dense feedback provided by the likelihood of the reference response, our method aligns retrieval directly with generation quality. Extensive experiments on nine personalization tasks demonstrate that PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines in both effectiveness and efficiency, establishing a principled and scalable solution for optimizing user profiles.
Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. … (voir plus)In this work, we propose **Procedurally Generated Tasks (PGT)** a simple data-driven framework that serves a dual purpose: inducing fine-grained visual understanding and acting as a low-cost diagnostic tool to identify the source of perception failures. By overlaying unambiguous geometric primitives on images, PGT generate additional dense supervision that disentangles visual grounding capability from semantic priors. Extensive experiments on relational, quantitative, and 3D/depth understanding benchmarks show that PGT yields remarkable gains across diverse architectures. Instruction tuning MLLMs on LLaVA-v1.5-Instruct augmented with PGT data results in improvements of up to +20\% on the What’sUp benchmark and +13.3\% on CV-Bench-2D, while maintaining general perception capabilities. Moreover, finetuning state-of-the-art MLLMs on PGT data leads to boosts of up to +5.5\% on What’sUp and +8.3\% on CV-Bench-2D. These findings demonstrate that PGT effectively address the bottleneck of fine-grained perception, revealing that many spatial reasoning deficits stem from inadequate supervision signals rather than inherent architectural or resolution limitations.
2025-12-31
International Conference on Machine Learning (Accept (regular))
Phenome-wide association studies rely on disease definitions derived from diagnostic codes, often failing to leverage the full richness of e… (voir plus)lectronic health records (EHR). We present MixEHR-SAGE, a PheCode-guided multi-modal topic model that integrates diagnoses, procedures, and medications to enhance phenotyping from large-scale EHRs. By combining expert-informed priors with probabilistic inference, MixEHR-SAGE identifies over 1000 interpretable phenotype topics from UK Biobank data. Applied to 350 000 individuals with high-quality genetic data, MixEHR-SAGE-derived risk scores accurately predict incident type 2 diabetes (T2D) and leukemia diagnoses. Subsequent genome-wide association studies using these continuous risk scores uncovered novel disease-associated loci, including PPP1R15A for T2D and JMJD6/SRSF2 for leukemia, that were missed by traditional binary case definitions. These results highlight the potential of probabilistic phenotyping from multi-modal EHRs to improve genetic discovery. The MixEHR-SAGE software is publicly available at: https://github.com/li-lab-mcgill/MixEHR-SAGE.
Interpretability research on large language models (LLMs) has produced methods that align model components to high-level concepts, yet their… (voir plus) use has been accompanied by recurring failures: findings that do not generalise, and causal language that outruns the evidence. Our position is that Pearl’s causal hierarchy formally defines what constitutes a good alignment, what data or assumptions it requires, and what inferences it supports. Specifically, observations of model behaviour support only associational claims; interventions enable cause-effect claims, but not necessarily predictions of model behaviour; counterfactuals, or predictions of behaviour on unseen examples, are often unverifiable in current studies. We show how interpretability research can benefit from causal representation learning (CRL), which provides tools for provably extracting semantic variables and their relationships from activations, and outline practical requirements for generalisable insights: robustness to distribution shifts, sensitivity to assumptions, and compositionality of interventions. Our diagnostic framework helps practitioners select appropriate methods and mitigate failures to ensure that claims match evidence and findings generalise.
2025-12-31
International Conference on Machine Learning (Accept (regular))
This position paper argues that AI agents with chain-of-thought reasoning capabilities are predisposed to exhibit collusive behavior and sho… (voir plus)uld be required to obtain behavioral certification before making decisions that affect economic markets. This is because integrating these agents into society could collapse the legal evidentiary distinction between competition and collusion among independent firms without eroding the economic harm distinction. Experiments with DeepSeek-R1 agents in the Bertrand oligopoly pricing domain reveal a tendency towards tacit collusion that persists even when humans prompt the agents not to collude. We further show that the chain-of- thought of these agents can be steered toward either extremely collusive or highly competitive behavior in a way that is not semantically detectable by another LLM analyzing the reasoning traces. As a result, deploying reasoning agents for market decisions leads to collusive economic outcomes without any evidence of conspiracy or intent. Thus, certification based on observed behavior in representative situations is necessary to prevent collusion. We provide preliminary evidence that such agents can be steered in a generalizable way toward efficient competitive equilibria. However, developing a comprehensive behavioral certification will be required before these models can be deployed in real-world markets while ensuring their stability and efficiency.
2025-12-31
International Conference on Machine Learning (Accept (regular))
The accelerated development, deployment and adoption of artificial intelligence systems has been fuelled by the increasing presence of big t… (voir plus)ech in the AI field. This trend has been accompanied by growing ethical concerns and intensified societal and environmental impacts. This position paper argues that irresponsible AI development is strongly driven by big tech's influence and involvement in the field. We develop this argument by laying out the factors through which this influence leads to irresponsible AI. First, we examine the growing and disproportionate influence of big tech in AI research and argue that its drive for scaling and general-purpose systems is fundamentally at odds with the responsible, ethical, and sustainable development of AI. Second, we review key current environmental and societal negative impacts of AI and trace their connections to big tech's influence. Third, we discuss the underlying economic forces driving big tech's actions. Finally, as a call to action, we highlight the need for AI researchers to counter big tech's influence, and review and propose strategies that build on the responsibility of implicated actors and collective action.
2025-12-31
International Conference on Machine Learning (Accept (spotlight))
In this position paper, we argue that current safety alignment research efforts for large language models are hindered by many intertwined s… (voir plus)ources of noise, such as small datasets, methodological inconsistencies, and unreliable evaluation setups. This can, at times, make it impossible to evaluate and compare attacks and defenses fairly, thereby slowing research progress. We systematically analyze the LLM safety evaluation pipeline, covering dataset curation, optimization strategies for automated red-teaming, response generation, and response evaluation using LLM judges. At each stage, we identify key issues and highlight their practical impact. We also propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers. Lastly, we offer an opposing perspective, highlighting practical reasons for existing limitations. We believe that addressing the outlined problems in future research will improve the field’s ability to generate easily comparable results and make measurable progress.
2025-12-31
International Conference on Machine Learning (Accept (regular))
Foundation models have transformed machine learning through large-scale pretraining, massive parameterization, and increased test-time compu… (voir plus)te. Despite surpassing human performance in several domains, these models remain fundamentally limited in continuous operation, experience accumulation, and personalization, capabilities that are central to adaptive intelligence. While continual learning research has long targeted these goals, its historical focus on in-weight learning, i.e., updating a single model’s parameters to absorb new knowledge, has rendered catastrophic forgetting a persistent challenge. **Our position is that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale.** We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities, thereby mitigating catastrophic forgetting and charting a practical roadmap toward continually learning agents.
2025-12-31
International Conference on Machine Learning (Accept (spotlight))
LLM-based social simulations—in which many language model agents interact over multiple turns—are rapidly proliferating across policy an… (voir plus)alysis, epidemiology, and computational social science. Yet the field lacks consensus on how to validate these simulations, with evaluation methods that are sparse, inconsistent, and rarely shared across disciplinary silos. We argue this creates a serious risk: premature deployment of unvalidated simulators in high-stakes domains. Our position is that the field must pivot from expansion to consolidation, prioritizing methodological standardization—shared benchmarks, open data, and reproducible evaluation protocols grounded in social science and complex systems research. We outline a concrete research program organized around specific learning problems/benchmarks, providing a path toward answering the fundamental question: when are LLM social simulations useful modelling objects?
2025-12-31
International Conference on Machine Learning (Accept (regular))
In this work, we propose an optimal reactive power dispatch (ORPD) stochastic program for volt-var optimization (VVO) of power distribution … (voir plus)networks. The formulation considers not only control settings of conventional VVO devices, e.g., voltage regulators, capacitor banks, and on-load tap changers, but also optimal settings for volt-var droop curves of distributed energy resources (DERs), compliant with the IEEE 1547-2018 standard. Instead of including the power flow equations in the optimization problem which makes it nonlinear and nonconvex, a power flow solver is utilized and the problem is solved by blackbox optimization (BBO). The feasibility of the derived solution is improved by using unbalanced power flow simulations. The solution is effective under various demand and DER generation scenarios such that device settings are not frequently changed, making it practical for in-field implementations. Through numerical simulations on IEEE test feeders, we illustrate the performance of the solutions of our proposed approach on both in-sample and out-of-sample scenarios. We show that our approach outperforms a benchmark reinforcement learning method, and is also scalable to large-scale distribution networks.