Join us on the Venture Scientist Bootcamp, a full time, 4-month incubator at Mila, built specifically for deep tech founders with elite STEM backgrounds.
Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
The primary focus of multi-agent reinforcement learning (MARL) has been to study interactions among a fixed number of agents embedded in an … (see more)environment. However, in the real world, the number of agents is neither fixed nor known a priori. Moreover, an agent can decide to create other agents (for example, a cell may divide, or a company may spin off a division). In this paper, we propose a framework that allows agents to create other agents; we call this a fluid-agent environment. We present game-theoretic solution concepts for fluid-agent games and empirically evaluate the performance of several MARL algorithms within this framework. Our experiments include fluid variants of established benchmarks such as Predator-Prey and Level-Based Foraging, where agents can dynamically spawn, as well as a new environment we introduce that highlights how fluidity can unlock novel solution strategies beyond those observed in fixed-population settings. We demonstrate that this framework yields agent teams that adjust their size dynamically to match environmental demands.
Large-scale analysis of axon and myelin morphometry in nervous tissues is fundamental to neuroscience research, yet manual quantification re… (see more)mains a profound bottleneck, limiting the scale and efficiency of studies. To address this, we introduce the Axon Segmentation Training Initiative for Histology (ASTIH), a publicly accessible resource designed to propel the development and validation of automated histomorphometry tools. ASTIH comprises five meticulously curated datasets, standardized for machine learning applications, featuring over 69,000 manually segmented axon fibers. These datasets exhibit significant diversity, spanning three microscopy modalities (TEM, SEM, bright-field), three species (mouse, rat, rabbit), and three distinct anatomical regions (brain, spinal cord, peripheral nerves) with varying pixel resolutions (from ~0.2 to 0.002
Medical imaging systems increasingly rely on large vision language foundation models (VLFMs) trained on diverse biomedical corpora, yet thes… (see more)e models remain difficult to adapt to new clinical tasks without costly fine-tuning and large annotated datasets. We present PIKACHU (Prototypical In-Context Knowledge Adaptation for Clinical Heterogeneous Usage), a lightweight and generalizable framework that enables rapid few-shot adaptation of frozen medical FMs using only a handful of labeled examples. Unlike prior approaches that modify backbone weights or introduce heavy attention-based adapters, PIKACHU performs all task adaptation directly in the FM feature space through in-context prototypical reasoning. Given a small support set, the framework constructs class prototypes by averaging normalized embeddings from a frozen VLFM image encoder and performs prediction on query images using temperature-scaled cosine similarity. Only a single temperature parameter is learned. We evaluate PIKACHU across three heterogeneous medical imaging datasets - dermatological images (ISIC), Optical Coherence Tomography (OCT), and Diabetic Retinopathy (DR), using established vision models (SigLIP, PubMedCLIP, DinoV2, and ViT) as backbones. The proposed in-context learning (ICL) strategy consistently outperforms the baseline (zero-shot) approaches across all datasets and architectures, achieving substantial improvements in both accuracy and AUC. Notably, with PubMedCLIP as the backbone, PIKACHU achieves 0.69/0.76 (Acc./AUC) on the ISIC dataset, 0.72/0.78 on OCT, and 0.79/0.88 on DR, demonstrating robust generalization across diverse clinical imaging modalities. These results highlight the promise of feature-space in-context learning as efficient and deployable paradigms for test-time adaptation of foundation models, without the need for extensive retraining.
Learning to coordinate many agents in partially observable and highly dynamic environments requires both informative representations and dat… (see more)a-efficient training. To address this challenge, we present a novel model-based multi-agent reinforcement learning framework that unifies joint state-action representation learning with imaginative roll-outs. We design a world model trained with variational auto-encoders and augment the model using the state-action learned embedding (SALE). SALE is injected into both the imagination module that forecasts plausible future roll-outs and the joint agent network whose individual action values are combined through a mixing network to estimate the joint action-value function. By coupling imagined trajectories with SALE-based action values, the agents acquire a richer understanding of how their choices influence collective outcomes, leading to improved long-term planning and optimization under limited real-environment interactions. Empirical studies on well-established multi-agent benchmarks, including StarCraft II Micro-Management, Multi-Agent MuJoCo, and Level-Based Foraging challenges, demonstrate consistent gains of our method over baseline algorithms and highlight the effectiveness of joint state-action learned embeddings within a multi-agent model-based paradigm.
AI Researchers' Views on Automating AI R&D and Intelligence Explosions
Severin Field
Raymond Douglas
David Krueger
Many leading AI researchers expect AI development to exceed the transformative impact of all previous technological revolutions. This belief… (see more) is based on the idea that AI will be able to automate the process of AI research itself, leading to a positive feedback loop. In August and September of 2025, we interviewed 25 leading researchers from frontier AI labs and academia, including participants from Google DeepMind, OpenAI, Anthropic, Meta, UC Berkeley, Princeton, and Stanford to understand researcher perspectives on these scenarios. Though AI systems have not yet been able to recursively improve, 20 of the 25 researchers interviewed identified automating AI research as one of the most severe and urgent AI risks. Participants converged on predictions that AI agents will become more capable at coding, math and eventually AI development, gradually transitioning from `assistants'or `tools'to `autonomous AI developers,'after which point, predictions diverge. While researchers agreed upon the possibility of recursive improvement, they disagreed on basic questions of timelines or appropriate governance mechanisms. For example, an epistemic divide emerged between frontier lab researchers and academic researchers, the latter of which expressed more skepticism about explosive growth scenarios. Additionally, 17/25 participants expected AI systems with advanced coding or R&D capabilities to be increasingly reserved for internal use at AI companies or governments, unseen by the public. Participants were split as to whether setting regulatory ``red lines"was a good idea, though almost all favored transparency-based mitigations.
To further improve secondary battery materials, we are increasingly exploring highly complex composition spaces in attempts to optimize mult… (see more)iple properties simultaneously. While our past work has done this in systematic manners using high-throughput experimentation, the exponential increase in the search space with triple doping makes grid search prohibitively expensive. Here, we demonstrate a closed-loop, multi-objective machine learning approach to guide the high-throughput workflow to efficiently navigate a space with approximately 14 million unique combinations. The test system is LiCoPO4 which we have previously explored using systematic codoping that was effective in optimizing one property only: energy density. To learn multiple electrochemical metrics, we first pretrain a set transformer on the public Materials Project database as a feature extractor, then attach a multi-task Gaussian process head and finetune the entire model on our high-throughput data. Through 3 rounds of active learning, we demonstrate that with a very small number of samples (as few as 125 random compositions and 63 predicted) we are able to simultaneously optimize four key electrochemical properties. Relative to the undoped system, the best composition raises our composite figure of merit by up to five times. This establishes an end-to-end workflow for accelerated battery materials design to be used in the rapidly growing field of autonomous materials discovery.
Pregnancy AI: Development and Internal Validation of an Artificial Intelligence Tool to Predict Live Births in ICSI and IVF Cycles Using Clinical Features and Embryo Images
Foundation models have achieved remarkable success, yet their growing parameter counts pose significant computational and memory challenges.… (see more) Low-rank factorization offers a promising route to reduce training and inference costs, but the community lacks a stable recipe for training models from scratch using exclusively low-rank weights while matching the performance of the dense model. We demonstrate that Large Language Models (LLMs) can be trained from scratch using exclusively low-rank factorized weights for all non-embedding matrices without auxiliary"full-rank"guidance required by prior methods. While native low-rank training often suffers from instability and loss spikes, we identify uncontrolled growth in the spectral norm (largest singular value) of the weight matrix update as the dominant factor. To address this, we introduce Spectron: Spectral renormalization with orthogonalization, which dynamically bounds the resultant weight updates based on the current spectral norms of the factors. Our method enables stable, end-to-end factorized training with negligible overhead. Finally, we establish compute-optimal scaling laws for natively low-rank transformers, demonstrating predictable power-law behavior and improved inference efficiency relative to dense models.
Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as… (see more) code generation, completion, analysis, and bug fixing. Ensuring the robustness of these models against potential vulnerabilities from handling diverse inputs is critical, as variations in input can lead to incorrect or insecure code outputs. Objective: This work aims to improve the robustness of LLMs for coding-related tasks against potential adversarial inputs. Specifically, we investigate how fine-tuning LLMs with perturbed datasets impacts their robustness against input perturbations. Method: We systematically evaluated LLM robustness by fine-tuning models using datasets perturbed at character-level, word-level, and sentence-level, comparing results against base models and models fine-tuned on unperturbed datasets. Results: Fine-tuning LLMs with perturbed datasets significantly improves model robustness (RD usually drops around 4\% - 6\%), especially for models with relatively weak robustness. However, this fine-tuning process typically results in a slight performance decrease (pass@1 usually drops around 1\% - 3\%) compared to fine-tuning with unperturbed datasets, although occasional performance improvements are observed. Conclusion \&Implications: Fine-tuning LLMs for coding tasks with perturbed data effectively enhances their robustness at the cost of a minor performance reduction, emphasizing the importance of balancing the robustness and performance of LLMs for coding applications.
Residual reinforcement learning (RL) enables stable online refinement of expressive pretrained policies by freezing the base and learning on… (see more)ly bounded corrections. However, value learning in residual RL poses unique challenges that remain poorly understood. In this work, we identify two key bottlenecks: cold start pathology, where the critic lacks knowledge of the value landscape around the base policy, and structural scale mismatch, where the residual contribution is dwarfed by the base action. Through systematic investigation, we uncover the mechanisms underlying these bottlenecks, revealing that simple yet principled solutions suffice: base-policy transitions serve as an essential value anchor for implicit warmup, and critic normalization effectively restores representation sensitivity for discerning value differences. Based on these insights, we propose DAWN (Data-Anchored Warmup and Normalization), a minimal approach targeting efficient value learning in residual RL. By addressing these bottlenecks, DAWN demonstrates substantial efficiency gains across diverse benchmarks, policy architectures, and observation modalities.