Mila’s AI for Climate Studio aims to bridge the gap between technology and impact to unlock the potential of AI in tackling the climate crisis rapidly and on a massive scale.
The program recently published its first policy brief, titled "Policy Considerations at the Intersection of Quantum Technologies and Artificial Intelligence," authored by Padmapriya Mohan.
Hugo Larochelle appointed Scientific Director of Mila
An adjunct professor at the Université de Montréal and former head of Google's AI lab in Montréal, Hugo Larochelle is a pioneer in deep learning and one of Canada’s most respected researchers.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Multimodal foundation world models for generalist embodied agents
State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the tra… (see more)ining on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing and enhancing the uncertainty of network predictions is of paramount importance in the pseudo-labeling process. In this work, we empirically demonstrate that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy, a lower bound of the Shannon entropy, as a potential cause for miscalibration. To alleviate this issue, we integrate a simple penalty term, which enforces the logit distances of the predictions on unlabeled samples to remain low, preventing the network predictions to become overconfident. Comprehensive experiments on a variety of SSL image classification benchmarks demonstrate that the proposed solution systematically improves the calibration performance of relevant SSL models, while also enhancing their discriminative power, being an appealing addition to tackle SSL tasks.
Dioxin (DXN) is a persistent organic pollutant produced from municipal solid waste incineration (MSWI) processes. It is a crucial environmen… (see more)tal indicator to minimize emission concentration by using optimization control, but it is difficult to monitor in real time. Aiming at online soft-sensing of DXN emission, a novel fuzzy tree broad learning system (FTBLS) is proposed, which includes offline training and online measurement. In the offline training part, weighted k-means is presented to construct a typical sample pool for reduced learning costs of offline and online phases. Moreover, the novel FTBLS, which contains a feature mapping layer, enhance layer, and increment layer, by replacing the fuzzy decision tree with neurons applied to construct the offline model. In the online measurement part, recursive principal component analysis is used to monitor the time-varying characteristic of the MSWI process. To measure DXN emission, offline FTBLS is reused for normal samples; for drift samples, fast incremental learning is used for online updates. A DXN data from the actual MSWI process is employed to prove the usefulness of FTBLS, where the RMSE of training and testing data are 0.0099 and 0.0216, respectively. This result shows that FTBLS can effectively realize DXN online prediction.
2024-01-01
IEEE Transactions on Industrial Informatics (published)
Throughout its history, Operational Research has evolved to include a variety of methods, models and algorithms that have been applied to a … (see more)diverse and wide range of contexts. This encyclopedic article consists of two main sections: methods and applications. The first aims to summarise the up-to-date knowledge and provide an overview of the state-of-the-art methods and key developments in the various subdomains of the field. The second offers a wide-ranging list of areas where Operational Research has been applied. The article is meant to be read in a nonlinear fashion. It should be used as a point of reference or first-port-of-call for a diverse pool of readers: academics, researchers, students, and practitioners. The entries within the methods and applications sections are presented in alphabetical order. The authors dedicate this paper to the 2023 Turkey/Syria earthquake victims. We sincerely hope that advances in OR will play a role towards minimising the pain and suffering caused by this and future catastrophes.
This paper explores a scenario in which a malicious actor employs a multi-armed attack strategy to manipulate data samples, offering them va… (see more)rious avenues to introduce noise into the dataset. Our central objective is to protect the data by detecting any alterations to the input. We approach this defensive strategy with utmost caution, operating in an environment where the defender possesses significantly less information compared to the attacker. Specifically, the defender is unable to utilize any data samples for training a defense model or verifying the integrity of the channel. Instead, the defender relies exclusively on a set of pre-existing detectors readily available"off the shelf". To tackle this challenge, we derive an innovative information-theoretic defense approach that optimally aggregates the decisions made by these detectors, eliminating the need for any training data. We further explore a practical use-case scenario for empirical evaluation, where the attacker possesses a pre-trained classifier and launches well-known adversarial attacks against it. Our experiments highlight the effectiveness of our proposed solution, even in scenarios that deviate from the optimal setup.
The common modus operandi of fine-tuning large pre-trained Transformer models entails the adaptation of all their parameters (i.e., full fin… (see more)e-tuning). While achieving striking results on multiple tasks, this approach becomes unfeasible as the model size and the number of downstream tasks increase. In natural language processing and computer vision, parameter-efficient approaches like prompt-tuning and adapters have emerged as solid alternatives by fine-tuning only a small number of extra parameters, without sacrificing performance accuracy. For audio classification tasks, the Audio Spectrogram Transformer model shows impressive results. However, surprisingly, how to efficiently adapt it to several downstream tasks has not been tackled before. In this paper, we bridge this gap and present a detailed investigation of common parameter-efficient methods, revealing that adapters and LoRA consistently outperform the other methods across four benchmarks. Whereas adapters prove to be more efficient in few-shot learning settings, LoRA turns out to scale better as we increase the number of learnable parameters. We finally carry out ablation studies to find the best configuration for adapters and LoRA.