Accueil

Inspirer le développement de l'intelligence artificielle au bénéfice de tous·tes

Un professeur s'entretient avec ses étudiants dans un café/lounge.

Situé au cœur de l’écosystème québécois en intelligence artificielle (IA), Mila rassemble une communauté de plus de 1200 personnes spécialisées en apprentissage automatique et dédiées à l’excellence scientifique et l’innovation.

À propos

À la une
À la une
À la une

Corps professoral

Fondé en 1993 par le professeur Yoshua Bengio, Mila regroupe aujourd'hui plus de 140 professeur·e·s affilié·e·s à l'Université de Montréal, l'Université McGill, Polytechnique Montréal et HEC Montréal. L'institut accueille également des professeur·e·s de l'Université Laval, de l'Université de Sherbrooke, de l'École de technologie supérieure (ÉTS) et de l'Université Concordia.

Consultez l'annuaire en ligne

Photo de Yoshua Bengio

Publications récentes

Adaptation, Comparison and Practical Implementation of Fairness Schemes in Kidney Exchange Programs
In Kidney Exchange Programs (KEPs), each participating patient is registered together with an incompatible donor. Donors without an incompat… (voir plus)ible patient can also register. Then, KEPs typically maximize overall patient benefit through donor exchanges. This aggregation of benefits calls into question potential individual patient disparities in terms of access to transplantation in KEPs. Considering solely this utilitarian objective may become an issue in the case where multiple exchange plans are optimal or near-optimal. In fact, current KEP policies are all-or-nothing, meaning that only one exchange plan is determined. Each patient is either selected or not as part of that unique solution. In this work, we seek instead to find a policy that contemplates the probability of patients of being in a solution. To guide the determination of our policy, we adapt popular fairness schemes to KEPs to balance the usual approach of maximizing the utilitarian objective. Different combinations of fairness and utilitarian objectives are modelled as conic programs with an exponential number of variables. We propose a column generation approach to solve them effectively in practice. Finally, we make an extensive comparison of the different schemes in terms of the balance of utility and fairness score, and validate the scalability of our methodology for benchmark instances from the literature.
Detecting High-Stakes Interactions with Activation Probes
Alex McKenzie
Urja Pawar
Phil Blandfort
William Bankes
Ekdeep Singh Lubana
Dmitrii Krasheninnikov
Monitoring is an important aspect of safely deploying Large Language Models (LLMs). This paper examines activation probes for detecting"high… (voir plus)-stakes"interactions -- where the text indicates that the interaction might lead to significant harm -- as a critical, yet underexplored, target for such monitoring. We evaluate several probe architectures trained on synthetic data, and find them to exhibit robust generalization to diverse, out-of-distribution, real-world data. Probes' performance is comparable to that of prompted or finetuned medium-sized LLM monitors, while offering computational savings of six orders-of-magnitude. Our experiments also highlight the potential of building resource-aware hierarchical monitoring systems, where probes serve as an efficient initial filter and flag cases for more expensive downstream analysis. We release our novel synthetic dataset and codebase to encourage further study.
Discrete Audio Tokens: More Than a Survey!
Pooneh Mousavi
Gallil Maimon
Adel Moumen
Darius Petermann
Jiatong Shi
Haibin Wu
Haici Yang
Anastasia Kuznetsova
Artem Ploujnikov
Ricard Marxer
Bhuvana Ramabhadran
Benjamin Elizalde
Loren Lugosch
Jinyu Li
Phil Woodland
Minje Kim
Hung-yi Lee
Shinji Watanabe
Yossi Adi … (voir 1 de plus)
Discrete audio tokens are compact representations that aim to preserve perceptual quality, phonetic content, and speaker characteristics whi… (voir plus)le enabling efficient storage and inference, as well as competitive performance across diverse downstream tasks.They provide a practical alternative to continuous features, enabling the integration of speech and audio into modern large language models (LLMs). As interest in token-based audio processing grows, various tokenization methods have emerged, and several surveys have reviewed the latest progress in the field. However, existing studies often focus on specific domains or tasks and lack a unified comparison across various benchmarks. This paper presents a systematic review and benchmark of discrete audio tokenizers, covering three domains: speech, music, and general audio. We propose a taxonomy of tokenization approaches based on encoder-decoder, quantization techniques, training paradigm, streamability, and application domains. We evaluate tokenizers on multiple benchmarks for reconstruction, downstream performance, and acoustic language modeling, and analyze trade-offs through controlled ablation studies. Our findings highlight key limitations, practical considerations, and open challenges, providing insight and guidance for future research in this rapidly evolving area. For more information, including our main results and tokenizer database, please refer to our website: https://poonehmousavi.github.io/dates-website/.
On Selecting Robust Approaches for Learning Predictive Biomarkers in Metabolomics Data Sets.
Thibaud Godon
Pier-Luc Plante
Metabolomics, the study of small molecules within biological systems, offers insights into metabolic processes and, consequently, holds grea… (voir plus)t promise for advancing health outcomes. Biomarker discovery in metabolomics represents a significant challenge, notably due to the high dimensionality of the data. Recent work has addressed this problem by analyzing the most important variables in machine learning models. Unfortunately, this approach relies on prior hypotheses about the structure of the data and may overlook simple patterns. To assess the true usefulness of machine learning methods, we evaluate them on a collection of 835 metabolomics data sets. This effort provides valuable insights for metabolomics researchers regarding where and when to use machine learning. It also establishes a benchmark for the evaluation of future methods. Nonetheless, the results emphasize the high diversity of data sets in metabolomics and the complexity of finding biologically relevant biomarkers. As a result, we propose a novel approach applicable across all data sets, offering guidance for future analyses. This method involves directly comparing univariate and multivariate models. We demonstrate through selected examples how this approach can guide data analysis across diverse data set structures, representative of the observed variability. Code and data are available for research purposes.

IA pour l'humanité

Le développement socialement responsable et bénéfique de l'IA est une dimension fondamentale de la mission de Mila. En tant que chef de file, nous souhaitons contribuer au dialogue social et au développement d'applications qui seront bénéfiques pour la société.

En savoir plus

Une personne regarde un ciel étoilé.