Langage et image

Les systèmes d'intelligence artificielle (IA) peuvent traiter des données collectées à partir de sources multiples, par le biais d'une variété de capteurs, pour aider les ordinateurs à faire des prédictions et à prendre des décisions. Les chercheur·euse·s de Mila sont des pionnier·ère·s dans les domaines du traitement du langage naturel et de la vision par ordinateur, et continuent d'explorer les intersections de ces deux technologies.

Ordinateur portable entrouvert, allumé dans une pièce sombre.

Projets phares

Chaire de recherche industrielle Ubisoft-Mila

Conçue pour guider l'innovation technologique dans l'industrie du jeu vidéo, la Chaire de recherche industrielle Ubisoft-Mila explore l'utilisation éthique de l'IA dans le développement des jeux.

Formes géométriques sur fond bleu foncé.

ConceptGraphs

ConceptGraphs est un système de cartographie qui construit des graphes de scène 3D d'objets et de leurs relations, permettant aux robots d'effectuer des tâches complexes de navigation et de manipulation d'objets.

Les progrès réalisés dans le domaine des grands modèles de langage ont propulsé l'IA dans une nouvelle phase, ce qui soulève de nombreuses questions nouvelles et importantes pour les chercheur·euse·s de Mila. Celles-ci sont liées à l'écart qui subsiste entre l'IA de pointe et les capacités cognitives humaines - notamment le raisonnement, la compréhension des causes et des effets et l'acceptation du doute - avec des conséquences importantes pour le déploiement et la sécurité de ces systèmes.

Les systèmes d'IA multimodaux peuvent faire des prédictions et prendre des décisions sur la base de plusieurs formes de données, notamment la vision, le langage naturel et l'audio, et peuvent, par exemple, fournir des sous-titres en direct et répondre à des questions sur des images.

Grâce à la recherche multimodale en apprentissage automatique, les exper·e·s de Mila contribuent à développer des systèmes d'IA plus aptes à comprendre comment les humains perçoivent le monde, ce qui leur permet de mieux répondre aux besoins de la société.

La recherche multimodale en apprentissage automatique nous permet de créer des systèmes d'IA plus proches de la perception du monde par les humains, qui seront plus aptes à servir l'humanité à l'avenir.

Aishwarya Agrawal, professeure adjointe, Université de Montréal, membre académique principale, Mila

Ressources

Understanding LLM Understanding

Co-organisée par Mila, cette école d'été qui s'est tenue en juin 2024 a rassemblé des expert⋅e⋅s de divers domaines tels que l'informatique, les neurosciences et la psychologie pour explorer les grands modèles de langage sous diverses perspectives.

Regarder toutes les présentations https://www.youtube.com/playlist?list=PLvSH07QabjqaXzsM5hidFKjYKz-K_bx1l

MAPL

MAPL est un système d'intelligence artificielle multimodal capable de comprendre des images et du texte, tout en générant du texte libre en sortie.

En savoir plus https://github.com/mair-lab/mapl

ConceptGraphs

ConceptGraphs est une représentation graphique structurée à vocabulaire ouvert pour les scènes 3D.

En savoir plus https://concept-graphs.github.io/

Laboratoires de recherche

Les professeur⋅e⋅s de Mila qui explorent le sujet dans le cadre de leurs recherches.

Corps professoral

Membre académique principal

David Ifeoluwa Adelani

McGill University

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Aishwarya Agrawal

Professeure adjointe, Université de Montréal, Département d'informatique et de recherche opérationnelle (DIRO)

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Sarath Chandar

Professeur associé, Polytechnique Montréal, Département d'informatique et de génie logiciel

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Laurent Charlin

Professeur agrégé, HEC Montréal, Département de Sciences de la décision

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Jackie Cheung

Directeur scientifique adjoint, Mila, Professeur agrégé, McGill University, École d'informatique

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique associé

James Clark

Professeur titulaire, McGill University

Voir le profil

Membre affilié

Maria Cutumisu

Professeur associé, McGill University

Voir le profil

Membre industriel associé

Alexandre Drouin

Chercheur scientifique, ServiceNow

Voir le profil

Membre affilié

Samira Ebrahimi Kahou

Professeure adjointe, University of Calgary, Départment de génie électrique et logiciel

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique associé

Christian Gagné

Professeur titulaire, Université Laval, Département de génie électrique et informatique

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique associé

Warren Gross

Professeur, McGill University, Département de génie électrique et informatique

Voir le profil

Membre académique associé

Toby Dylan Hocking

Professeur agrégé, Université Sherbrooke, Département d'informatique

Voir le profil

Membre affilié

Mahdi Hosseini

Professeur adjoint, Concordia University

Voir le profil

Membre affilié

Shin (Alexandre) Koseki

Professeur adjoint, Université de Montréal, École d'urbanisme et d'architecture de paysage

Voir le profil

Membre académique associé

Xue (Steve) Liu

Professeur titulaire, McGill University, École d'informatique

Voir le profil

Membre académique principal

Tegan Maharaj

Professeure adjointe en apprentissage automatique, HEC Montréal, Département de sciences de la décision

Voir le profil

Membre académique associé

Eilif Benjamin Muller

Professeur adjoint, Université de Montréal, Département de neurosciences

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Derek Nowrouzezahrai

Professeur agrégé, McGill University, Département de génie électrique et informatique

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Timothy O'Donnell

Professeur adjoint, McGill University, Département de linguistique

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Chris Pal

Professeur titulaire, Polytechnique Montréal, Département de génie informatique et de génie logiciel

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique associé

Laurence Perreault-Levasseur

Professeure adjointe, Université de Montréal, Département de physique

Voir le profil

Membre académique associé

Pablo Piantanida

Professeur titulaire, Université Paris-Saclay

Voir le profil

Membre académique principal

Guillaume Rabusseau

Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Siamak Ravanbakhsh

Professeur adjoint, McGill University, École d'informatique

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique principal

Siva Reddy

Professeur adjoint, McGill University, École d'informatique et Département de linguistique

Chaire en IA Canada-CIFAR

Voir le profil

Membre académique associé

Ayla Rigouts Terryn

Professeure adjointe, Université de Montréal, Linguistique et de traduction

Voir le profil

Membre académique principal

Irina Rish

Professeure titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle

Chaire en IA Canada-CIFAR

Voir le profil

Membre industriel associé

Fabio Viola

Ingénieur chercheur principal, Google DeepMind

Voir le profil

Membre industriel associé

Kory Wallace Mathewson

Chercheur scientifique, DeepMind

Voir le profil

Membre académique associé

Amal Zouaq

Professeure titulaire, Polytechnique Montréal, Département de génie informatique et génie logiciel

Voir le profil

Publications

IG-RL: Inductive Graph Reinforcement Learning for Massive-Scale Traffic Signal Control

FranÃ§ois-Xavier Devailly

Denis Larocque

Laurent Charlin

Scaling adaptive traffic signal control involves dealing with combinatorial state and action spaces. Multi-agent reinforcement learning atte… (voir plus)mpts to address this challenge by distributing control to specialized agents. However, specialization hinders generalization and transferability, and the computational graphs underlying neural-network architectures—dominating in the multi-agent setting—do not offer the flexibility to handle an arbitrary number of entities which changes both between road networks, and over time as vehicles traverse the network. We introduce Inductive Graph Reinforcement Learning (IG-RL) based on graph-convolutional networks which adapts to the structure of any road network, to learn detailed representations of traffic signal controllers and their surroundings. Our decentralized approach enables learning of a transferable-adaptive-traffic-signal-control policy. After being trained on an arbitrary set of road networks, our model can generalize to new road networks and traffic distributions, with no additional training and a constant number of parameters, enabling greater scalability compared to prior methods. Furthermore, our approach can exploit the granularity of available data by capturing the (dynamic) demand at both the lane level and the vehicle level. The proposed method is tested on both road networks and traffic settings never experienced during training. We compare IG-RL to multi-agent reinforcement learning and domain-specific baselines. In both synthetic road networks and in a larger experiment involving the control of the 3,971 traffic signals of Manhattan, we show that different instantiations of IG-RL outperform baselines.

2020-03-06

ArXiv (preprint)

doi.org

arxiv.org

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

Oscar Mañas

Pau Rodriguez

Saba Ahmadi

Aida Nematzadeh

Yash Goyal

Aishwarya Agrawal

Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We p… (voir plus)ropose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation spaces of unimodal models using aligned image-text data, and can generalize to unseen VL tasks from just a few in-context examples. The small number of trainable parameters makes MAPL effective at low-data and in-domain learning. Moreover, MAPL’s modularity enables easy extension to other pre-trained models. Extensive experiments on several visual question answering and image captioning benchmarks show that MAPL achieves superior or competitive performance compared to similar methods while training orders of magnitude fewer parameters. MAPL can be trained in just a few hours using modest computational resources and public datasets. We release our code and pre-trained model weights at https://github.com/oscmansan/mapl.

2023-05-01

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (publié)

doi.org

arxiv.org

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Vikram Voleti

Alexia Jolicoeur-Martineau

Chris Pal

Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor … (voir plus)and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically: future/past prediction -- when only future/past frames are masked; unconditional generation -- when both past and future frames are masked; and interpolation -- when neither past nor future frames are masked. Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our MCVD models are built from simple non-recurrent 2D-convolutional architectures, conditioning on blocks of frames and generating blocks of frames. We generate videos of arbitrary lengths autoregressively in a block-wise manner. Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using

openreview.net

MeshDiffusion: Score-based Generative 3D Mesh Modeling

Zhen Liu

Yao Feng

Michael J. Black

Derek Nowrouzezahrai

Liam Paull

Weiyang Liu

We consider the task of generating realistic 3D shapes, which is useful for a variety of applications such as automatic scene generation and… (voir plus) physical simulation. Compared to other 3D representations like voxels and point clouds, meshes are more desirable in practice, because (1) they enable easy and arbitrary manipulation of shapes for relighting and simulation, and (2) they can fully leverage the power of modern graphics pipelines which are mostly optimized for meshes. Previous scalable methods for generating meshes typically rely on sub-optimal post-processing, and they tend to produce overly-smooth or noisy surfaces without fine-grained geometric details. To overcome these shortcomings, we take advantage of the graph structure of meshes and use a simple yet very effective generative modeling method to generate 3D meshes. Specifically, we represent meshes with deformable tetrahedral grids, and then train a diffusion model on this direct parameterization. We demonstrate the effectiveness of our model on multiple generative tasks.

2023-02-01

ICLR.cc/2023/Conference (notable)

doi.org

openreview.net

Voir plus de publications