Mila’s AI for Climate Studio aims to bridge the gap between technology and impact to unlock the potential of AI in tackling the climate crisis rapidly and on a massive scale.
The program recently published its first policy brief, titled "Policy Considerations at the Intersection of Quantum Technologies and Artificial Intelligence," authored by Padmapriya Mohan.
Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on August 26 and 28, 2025.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Traditional recommendation systems represent user preferences in dense representations obtained through black-box encoder models. While thes… (see more)e models often provide strong recommendation performance, they lack interpretability for users, leaving users unable to understand or control the system’s modeling of their preferences. This limitation is especially challenging in music recommendation, where user preferences are highly personal and often evolve based on nuanced qualities like mood, genre, tempo, or instrumentation.
In this paper, we propose an audio prototypical network for controllable music recommendation. This network expresses user preferences in terms of prototypes representative of semantically meaningful features pertaining to musical qualities. We show that the model obtains competitive recommendation performance compared to popular baseline models while also providing interpretable and controllable user profiles.
Large language model (LLM) agents for web interfaces have advanced rapidly, yet open-source systems still lag behind proprietary agents. Bri… (see more)dging this gap is key to enabling customizable, efficient, and privacy-preserving agents. Two challenges hinder progress: the reproducibility issues in RL and LLM agent training, where results often depend on sensitive factors like seeds and decoding parameters, and the focus of prior work on single-step tasks, overlooking the complexities of web-based, multi-step decision-making.
We address these gaps by providing a statistically driven study of training LLM agents for web tasks. Our two-stage pipeline combines imitation learning from a Llama 3.3 70B teacher with on-policy fine-tuning via Group Relative Policy Optimization (GRPO) on a Llama 3.1 8B student. Through 240 configuration sweeps and rigorous bootstrapping, we chart the first compute allocation curve for open-source LLM web agents. Our findings show that dedicating one-third of compute to teacher traces and the rest to RL improves MiniWoB++ success by 6 points and closes 60% of the gap to GPT-4o on WorkArena, while cutting GPU costs by 45%. We introduce a principled hyperparameter sensitivity analysis, offering actionable guidelines for robust and cost-effective agent training.
LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with op… (see more)en-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents. To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT), followed by on-policy reinforcement learning. We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.
Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of
AI systems by constraining their decisions on a set of human-unde… (see more)rstandable
concepts. However, CBMs typically assume that datasets contain accurate concept
labels—an assumption often violated in practice, which we show can significantly
degrade performance (by 25% in some cases). To address this, we introduce the
Concept Preference Optimization (CPO) objective, a new loss function based on
Direct Preference Optimization, which effectively mitigates the negative impact
of concept mislabeling on CBM performance. We provide an analysis of some
key properties of the CPO objective showing it directly optimizes for the concept’s
posterior distribution, and contrast it against Binary Cross Entropy (BCE) where
we show CPO is inherently less sensitive to concept noise. We empirically confirm
our analysis finding that CPO consistently outperforms BCE in three real-world
datasets with and without added label noise.
Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of
AI systems by constraining their decisions on a set of human-unde… (see more)rstandable
concepts. However, CBMs typically assume that datasets contain accurate concept
labels—an assumption often violated in practice, which we show can significantly
degrade performance (by 25% in some cases). To address this, we introduce the
Concept Preference Optimization (CPO) objective, a new loss function based on
Direct Preference Optimization, which effectively mitigates the negative impact
of concept mislabeling on CBM performance. We provide an analysis of some
key properties of the CPO objective showing it directly optimizes for the concept’s
posterior distribution, and contrast it against Binary Cross Entropy (BCE) where
we show CPO is inherently less sensitive to concept noise. We empirically confirm
our analysis finding that CPO consistently outperforms BCE in three real-world
datasets with and without added label noise.
Traditional recommender systems rely on high-dimensional (latent)
embeddings for modeling user-item interactions, often resulting in
opaque … (see more)representations that lack interpretability. Moreover, these
systems offer limited control to users over their recommendations.
Inspired by recent work, we introduce TExtuAl Representations for
Scrutable recommendations (TEARS) to address these challenges.
Instead of representing a user’s interests through latent embed-
dings, TEARS encodes them in natural text, providing transparency
and allowing users to edit them. To encode such preferences, we
use modern LLMs to generate high-quality user summaries which
we find uniquely capture user preferences. Using these summaries
we take a hybrid approach where we use an optimal transport
procedure to align the summaries’ representations with the repre-
sentation of a standard VAE for collaborative filtering. We find this
approach can surpass the performance of the three popular VAE
models while providing user-controllable recommendations. We
further analyze the controllability of TEARS through three simu-
lated user tasks to evaluate the effectiveness of user edits on their
summaries. Our code and all user-summaries can be seen in an
anonymized repository.