Développez des compétences fondamentales en intelligence artificielle (IA) responsable grâce à des cours autodirigés, animés par des expert·e·s de Mila reconnu·e·s à l’échelle internationale.
Le Fellowship Mila en politiques de l'IA transforme l'expertise approfondie en IA en politiques rigoureuses d'intérêt public. Découvrez la dernière publication Combler la disparité en matière d’expertise : mécanismes de transfert des connaissances pour la réglementation de l’IA par Moritz von Knebel.
Ce programme soutient les startups spécialisées en IA à tout moment de l'année. Bénéficiez de ressources de pointe et d'un accompagnement sur mesure pour accélérer le développement de votre technologie.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
Large Reasoning Models (LRMs) have shown remarkable performance on challenging questions, such as math and coding. However, to obtain a high… (voir plus) quality solution, one may need to sample more than once. In principal, there are two sampling strategies that can be composed to form more complex processes: sequential sampling and parallel sampling. In this paper, we first compare these two approaches with rigor, and observe, aligned with previous works, that parallel sampling seems to outperform sequential sampling even though the latter should have more representation power. To understand the underline reasons, we make three hypothesis on the reason behind this behavior: (i) parallel sampling outperforms due to the aggregator operator; (ii) sequential sampling is harmed by needing to use longer contexts; (iii) sequential sampling leads to less exploration due to conditioning on previous answers. The empirical evidence on various model families and sizes (Qwen3, DeepSeek-R1 distilled models, Gemini 2.5) and question domains (math and coding) suggests that the aggregation and context length do not seem to be the main culprit behind the performance gap. In contrast, the lack of exploration seems to play a considerably larger role, and we argue that this is one main cause for the performance gap.
Named Entity Recognition (NER) is a foundational NLP task, yet research in Yor\`ub\'a has been constrained by limited and domain-specific re… (voir plus)sources. Existing resources, such as MasakhaNER (a manually annotated news-domain corpus) and WikiAnn (automatically created from Wikipedia), are valuable but restricted in domain coverage. To address this gap, we present YoNER, a new multidomain Yor\`ub\'a NER dataset that extends entity coverage beyond news and Wikipedia. The dataset comprises about 5,000 sentences and 100,000 tokens collected from five domains including Bible, Blogs, Movies, Radio broadcast and Wikipedia, and annotated with three entity types: Person (PER), Organization (ORG) and Location (LOC), following CoNLL-style guidelines. Annotation was conducted manually by three native Yor\`ub\'a speakers, with an inter-annotator agreement of over 0.70, ensuring high quality and consistency. We benchmark several transformer encoder models using cross-domain experiments with MasakhaNER 2.0, and we also assess the effect of few-shot in-domain data using YoNER and cross-lingual setups with English datasets. Our results show that African-centric models outperform general multilingual models for Yor\`ub\'a, but cross-domain performance drops substantially, particularly for blogs and movie domains. Furthermore, we observed that closely related formal domains, such as news and Wikipedia, transfer more effectively. In addition, we introduce a new Yor\`ub\'a-specific language model (OyoBERT) that outperforms multilingual models in in-domain evaluation. We publicly release the YoNER dataset and pretrained OyoBERT models to support future research on Yor\`ub\'a natural language processing.
Vision-language Models (VLMs), despite achieving strong performance on multimodal benchmarks, often misinterpret straightforward visual conc… (voir plus)epts that humans identify effortlessly, such as counting, spatial reasoning, and viewpoint understanding. Previous studies manually identified these weaknesses and found that they often stem from deficits in specific skills. However, such manual efforts are costly, unscalable, and subject to human bias, which often overlooks subtle details in favor of salient objects, resulting in an incomplete understanding of a model's vulnerabilities. To address these limitations, we propose a Reinforcement Learning (RL)-based framework to automatically discover the failure modes or blind spots of any candidate VLM on a given data distribution without human intervention. Our framework trains a questioner agent that adaptively generates queries based on the candidate VLM's responses to elicit incorrect answers. Our approach increases question complexity by focusing on fine-grained visual details and distinct skill compositions as training progresses, consequently identifying 36 novel failure modes in which VLMs struggle. We demonstrate the broad applicability of our framework by showcasing its generalizability across various model combinations.
AI-based writing assistants are ubiquitous, yet little is known about how users' mental models shape their use. We examine two types of ment… (voir plus)al models -- functional or related to what the system does, and structural or related to how the system works -- and how they affect control behavior -- how users request, accept, or edit AI suggestions as they write -- and writing outcomes. We primed participants (
Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encod… (voir plus)e. Deep generative models can design new proteins that bind ligands, but none have created enzymes without pre-specifying catalytic residues. We introduce DISCO (DIffusion for Sequence-structure CO-design), a multimodal model that co-designs protein sequence and 3D structure around arbitrary biomolecules, as well as inference-time scaling methods that optimize objectives across both modalities. Conditioned solely on reactive intermediates, DISCO designs diverse heme enzymes with novel active-site geometries. These enzymes catalyze new-to-nature carbene-transfer reactions, including alkene cyclopropanation, spirocyclopropanation, B-H, and C(sp
Psychedelic drugs are re-emerging as promising scientific and clinical tools. However, despite a rapidly expanding literature on their thera… (voir plus)peutic value, the neural mechanisms underlying psychedelic effects remain unclear. Resting-state functional magnetic resonance imaging studies of acute psychedelic effects, conducted independently by several research groups, have so far yielded fragmented and sometimes inconsistent findings. Here, to help facilitate greater convergence, we conducted a 'mega-analysis' integrating 11 independent resting-state functional magnetic resonance imaging datasets across five psychedelic drugs (psilocybin, lysergic acid diethylamide, mescaline, N,N-dimethyltryptamine and ayahuasca) from research groups spanning three continents and five countries. By applying a uniform preprocessing pipeline and a Bayesian hierarchical modeling framework, we discovered several common features in the induced alterations to brain function across drugs and sites. Most prominently, we identified a core signature of increased functional connectivity between transmodal (default, frontoparietal and limbic) and unimodal networks (visual and somatomotor), with subnetwork specificity. Furthermore, key subcortical regions (thalamus, caudate and putamen) and the cerebellum exhibited altered coupling with sensorimotor networks. In contrast to several single-site reports, Bayesian modeling revealed weak-to-moderate and selective reductions in within-network functional connectivity, with substantial variability across drugs and networks. Together, these findings extend past work by demonstrating that psychedelics reconfigure large-scale cortical organization while selectively engaging subcortical circuitry. This study provides the most comprehensive synthesis of psychedelic brain action to date, helping resolve inconsistencies and offering a probabilistic map of how psychedelics alter large-scale brain organization. We hereby provide a cornerstone to benchmark and shepherd future psychedelic neuroimaging research.
Decoder-Transformers have achieved remarkable success and have laid the groundwork for the development of Large Language Models (LLMs). At t… (voir plus)he core of these models is the self-attention matrix, which allows different tokens to interact with each other. This process is remarkably similar to the message-passing mechanism used in Graph Neural Networks (GNNs), and as such decoder-Transformers suffer many of the optimization difficulties studied extensively in the GNN literature. In this paper, we present a unified graph perspective that bridges the theoretical understanding of decoder-Transformers and GNNs. We systematically examine how well-known phenomena in GNNs, such as over-smoothing and over-squashing, directly manifest as analogous issues like rank collapse and representational collapse in deep Transformer architectures. By interpreting Transformers' self-attention as a learned adjacency operator, we reveal shared underlying principles governing signal propagation and demonstrate how insights from one field can illuminate challenges and solutions in the other. We analyze the role of architectural components like residual connections, normalization, and causal masking in these issues. We aim to provide a framework for understanding how information flows through deep learning models that perform sequence mixing through an adjacency operator, and to highlight areas for cross-pollination of research, as well as to provide a comprehensive reference for researchers interested in the underpinnings of these architectures.
2026-04-03
Transactions on Machine Learning Research (accepté)
Multivariate count models are often justified by their ability to capture latent dependence, but researchers receive little guidance on when… (voir plus) this added structure improves on simpler penalized marginal Poisson regression. We study this question using real microbiome data under a unified held-out evaluation framework. For count prediction, we compare PLN and GLMNet(Poisson) on 20 datasets spanning 32 to 18,270 samples and 24 to 257 taxa, using held-out Poisson deviance under leave-one-taxon-out prediction with 3-fold sample cross-validation rather than synthetic or in-sample criteria. For network inference, we compare PLNNetwork and GLMNet(Poisson) neighborhood selection on five publicly available datasets with experimentally validated microbial interaction truth. PLN outperforms GLMNet(Poisson) on most count-prediction datasets, with gains up to 38 percent. The primary predictor of the winner is the sample-to-taxon ratio, with mean absolute correlation as the strongest secondary signal and overdispersion as an additional predictor. PLNNetwork performs best on broad undirected interaction benchmarks, whereas GLMNet(Poisson) is better aligned with local or directional effects. Taken together, these results provide guidance for choosing between latent multivariate count models and penalized Poisson regression in biological count prediction and interaction recovery.
Large language model (LLM) agents learn by interacting with environments, but long-horizon training remains fundamentally bottlenecked by sp… (voir plus)arse and delayed rewards. Existing methods typically address this challenge through post-hoc credit assignment or external reward models, which provide limited guidance at inference time and often separate reward improvement from policy improvement. We propose Self-Guide, a self-generated internal reward for language agents that supports both inference-time guidance and training-time supervision. Specifically, the agent uses Self-Guide as a short self-guidance signal to steer the next action during inference, and converts the same signal into step-level internal reward for denser policy optimization during training. This creates a co-evolving loop: better policy produces better guidance, and better guidance further improves policy as internal reward. Across three agent benchmarks, inference-time self-guidance already yields clear gains, while jointly evolving policy and internal reward with GRPO brings further improvements (8\%) over baselines trained solely with environment reward. Overall, our results suggest that language agents can improve not only by collecting more experience, but also by learning to generate and refine their own internal reward during acting and learning.
Many real-world scenarios involve solving bilevel optimization problems in which there is an outer discrete optimization problem and an inne… (voir plus)r problem involving expensive or black box computation. This arises in space-time–dependent variants of the traveling salesman problem, such as when planning space missions that visit multiple astronomical objects. Planning these missions presents significant challenges due to the constant relative motion of the objects involved. There is an outer combinatorial problem of finding the optimal order to visit the objects and an inner optimization problem that requires finding the optimal departure time and trajectory to travel between each pair of objects. The constant motion of the objects complicates the inner problem, making it computationally expensive. This paper introduces a novel framework utilizing decision diagrams (DDs) and a DD-based branch-and-bound technique, peel-and-bound, to achieve exact solutions for such bilevel optimization problems, assuming sufficient inner problem optimizer quality. The framework leverages problem-specific knowledge to expedite search processes and minimize the number of expensive evaluations required. As a case study, we apply this framework to the asteroid routing problem, a benchmark problem in global trajectory optimization. Experimental results demonstrate the framework’s scalability and ability to generate robust heuristic solutions for tested instances. Many of these solutions are exact, contingent on the assumed quality of the inner problem’s optimizer. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms–Discrete. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2024.0866 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2024.0866 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
Rationale: Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous syndrome with substantial unmet diagnostic and therapeu… (voir plus)tic needs. Circulating lipid metabolism is increasingly implicated in HFpEF pathophysiology but has not been systematically leveraged for molecular stratification. Objective: To determine whether plasma lipidomics can identify molecular phenogroups of HFpEF associated with distinct clinical characteristics and outcomes. Methods and Results: Untargeted plasma lipidomics was performed in non-HF subjects and HFpEF patients from a primary Belgian cohort and an independent Canadian cohort (n=177 in each cohort). In the Belgian cohort, 235 unique lipids spanning 19 subclasses were annotated, including 96 significantly associated with HFpEF (q<0.02). Unsupervised analyses revealed marked lipidomic heterogeneity, with a distinct HFpEF subgroup separable from non-HF subjects. Hierarchical clustering identified three phenogroups with divergent lipid profiles and clinical features. One phenogroup exhibited severe atrial dysfunction, congestion-related biomarkers, elevated indices of cardiac and liver fibrosis, and markedly reduced survival, a second was characterized by prominent metabolic syndrome features, and a third by preserved renal function. Cross-cohort comparison using a supervised classifier trained on 158 shared lipids confirmed analogous lower-risk phenogroups in the Canadian cohort, while the high-risk phenotype was underrepresented. A signature of 10 lipids across six subclasses, including long-chain acylcarnitines, ether phosphatidylcholines, and oxidized sphingomyelins, discriminated the high-risk group and correlated with markers of disease severity. Conclusion: Our findings demonstrate that HFpEF comprises metabolically distinct patient subgroups across cohorts, revealing specific lipidomic dysfunctions that deepen our understanding of HFpEF heterogeneity and underlying pathophysiology.