Language and Image

Artificial intelligence (AI) systems can process data collected from multiple sources, through a variety of sensors, to help computers make predictions and decisions. Mila’s researchers are pioneers in the fields of natural language processing and computer vision, and continue to explore the intersections of both technologies.

Half-open laptop, switched on in a dark room.

Advances in large language models have propelled AI into a new phase, leading to many new and important questions for Mila researchers. These are linked to the remaining gap between state-of-the-art AI and human cognitive abilities — including reasoning, understanding cause and effect properly, and accepting self-doubt — with important consequences in the deployment and safety of these systems.

Multimodal AI systems can make predictions and decisions based on multiple forms of data — including vision, natural language, and audio — and can, for example, deliver live-captioning and answer questions about images.

Through multimodal research in machine learning, Mila’s experts are helping to develop AI systems that are more capable of understanding how humans perceive the world, which makes them better at serving the needs of society.

Featured Projects

Ubisoft-Mila Industrial Research Chair

Designed to guide technological innovation in the video game industry, the Ubisoft-Mila Industrial Research Chair explores the ethical use of AI in game development.

Learn more

Geometric shapes on a dark blue background.

ConceptGraphs

ConceptGraphs is a mapping system that builds 3D scene-graphs of objects and their relationships, enabling robots to perform complex navigation and object manipulation tasks.

Learn more

Multimodal research in machine learning allows us to make AI systems that are closer to how humans perceive the world, making them more suitable to serve humanity in the future.

Aishwarya Agrawal, Assistant Professor, Université de Montréal, Core Academic Member, Mila

Resources

Understanding LLM Understanding

Co-sponsored by Mila, this summer school held in June 2024 brought together experts from diverse fields such as computer science, neuroscience and psychology to deepen our understanding of large language models through various lenses.

Watch all lectures https://www.youtube.com/playlist?list=PLvSH07QabjqaXzsM5hidFKjYKz-K_bx1l

MAPL

MAPL is a multimodal AI system capable of understanding images and text, while generating free-form text as output.

Learn more https://github.com/mair-lab/mapl

ConceptGraphs

ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.

Learn more https://concept-graphs.github.io/

Research Labs

Mila professors exploring the subject as part of their research.

Mila Faculty

Core Academic Member

David Ifeoluwa Adelani

McGill University

Canada CIFAR AI Chair

View profile

Core Academic Member

Aishwarya Agrawal

Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research

Canada CIFAR AI Chair

View profile

Core Academic Member

Sarath Chandar

Associate Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering

Canada CIFAR AI Chair

View profile

Core Academic Member

Laurent Charlin

Associate Professor, HEC Montréal, Department of Decision Sciences

Canada CIFAR AI Chair

View profile

Core Academic Member

Jackie Cheung

Associate Scientific Director, Mila, Associate Professor, McGill University, School of Computer Science

Canada CIFAR AI Chair

View profile

Associate Academic Member

James Clark

Full Professor, McGill University

View profile

Affiliate Member

Maria Cutumisu

Associate Professor, McGill University

View profile

Associate Industry Member

Alexandre Drouin

Research Scientist, ServiceNow

View profile

Affiliate Member

Samira Ebrahimi Kahou

Assistant Professor, University of Calgary, Deparment of Electrical and Software Engineering

Canada CIFAR AI Chair

View profile

Associate Academic Member

Christian Gagné

Full Professor, Université Laval, Department of Electrical and Computer Engineering

Canada CIFAR AI Chair

View profile

Associate Academic Member

Warren Gross

Professor, McGill University, Department of Electrical and Computer Engineering

View profile

Associate Academic Member

Toby Dylan Hocking

Associate Professor, Université Sherbrooke, Department of Computer Science

View profile

Affiliate Member

Mahdi Hosseini

Assistant Professor, Concordia University

View profile

Affiliate Member

Shin (Alexandre) Koseki

Assistant Professor, Université de Montréal, School of Urban Planning and Landscape Architecture

View profile

Associate Academic Member

Xue (Steve) Liu

Full Professor, McGill University, School of Computer Science

View profile

Core Academic Member

Tegan Maharaj

Assistant Professor in Machine Learning, HEC Montréal, Department of Decision Science

View profile

Associate Academic Member

Eilif Benjamin Muller

Assistant Professor, Université de Montréal, Department of Neurosciences

Canada CIFAR AI Chair

View profile

Core Academic Member

Derek Nowrouzezahrai

Associate Professor, McGill University, Department of Electrical and Computer Engineering

Canada CIFAR AI Chair

View profile

Core Academic Member

Timothy O'Donnell

Assistant Professor, McGill University, Department of Linguistics

Canada CIFAR AI Chair

View profile

Core Academic Member

Chris Pal

Full Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering

Canada CIFAR AI Chair

View profile

Associate Academic Member

Laurence Perreault-Levasseur

Assistant Professor, Université de Montréal, Department of Physics

View profile

Associate Academic Member

Pablo Piantanida

Full Professor, Université Paris-Saclay

View profile

Core Academic Member

Guillaume Rabusseau

Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research

Canada CIFAR AI Chair

View profile

Core Academic Member

Siamak Ravanbakhsh

Assistant Professor, McGill University, School of Computer Science

Canada CIFAR AI Chair

View profile

Core Academic Member

Siva Reddy

Assistant Professor, McGill University, School of Computer Science and Department of Linguistics

Canada CIFAR AI Chair

View profile

Associate Academic Member

Ayla Rigouts Terryn

Assistant Professor, Université de Montréal, Linguistics and translation

View profile

Core Academic Member

Irina Rish

Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department

Canada CIFAR AI Chair

View profile

Associate Industry Member

Fabio Viola

Senior Research Engineer, Google DeepMind

View profile

Associate Industry Member

Kory Wallace Mathewson

Research Scientist, DeepMind

View profile

Associate Academic Member

Amal Zouaq

Full Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering

View profile

Publications

IG-RL: Inductive Graph Reinforcement Learning for Massive-Scale Traffic Signal Control

FranÃ§ois-Xavier Devailly

Denis Larocque

Laurent Charlin

Scaling adaptive traffic signal control involves dealing with combinatorial state and action spaces. Multi-agent reinforcement learning atte… (see more)mpts to address this challenge by distributing control to specialized agents. However, specialization hinders generalization and transferability, and the computational graphs underlying neural-network architectures—dominating in the multi-agent setting—do not offer the flexibility to handle an arbitrary number of entities which changes both between road networks, and over time as vehicles traverse the network. We introduce Inductive Graph Reinforcement Learning (IG-RL) based on graph-convolutional networks which adapts to the structure of any road network, to learn detailed representations of traffic signal controllers and their surroundings. Our decentralized approach enables learning of a transferable-adaptive-traffic-signal-control policy. After being trained on an arbitrary set of road networks, our model can generalize to new road networks and traffic distributions, with no additional training and a constant number of parameters, enabling greater scalability compared to prior methods. Furthermore, our approach can exploit the granularity of available data by capturing the (dynamic) demand at both the lane level and the vehicle level. The proposed method is tested on both road networks and traffic settings never experienced during training. We compare IG-RL to multi-agent reinforcement learning and domain-specific baselines. In both synthetic road networks and in a larger experiment involving the control of the 3,971 traffic signals of Manhattan, we show that different instantiations of IG-RL outperform baselines.

2022-07-01

IEEE Transactions on Intelligent Transportation Systems (published)

doi.org

arxiv.org

MeshDiffusion: Score-based Generative 3D Mesh Modeling

Zhen Liu

Yao Feng

Michael J. Black

Derek Nowrouzezahrai

Liam Paull

Weiyang Liu

We consider the task of generating realistic 3D shapes, which is useful for a variety of applications such as automatic scene generation and… (see more) physical simulation. Compared to other 3D representations like voxels and point clouds, meshes are more desirable in practice, because (1) they enable easy and arbitrary manipulation of shapes for relighting and simulation, and (2) they can fully leverage the power of modern graphics pipelines which are mostly optimized for meshes. Previous scalable methods for generating meshes typically rely on sub-optimal post-processing, and they tend to produce overly-smooth or noisy surfaces without fine-grained geometric details. To overcome these shortcomings, we take advantage of the graph structure of meshes and use a simple yet very effective generative modeling method to generate 3D meshes. Specifically, we represent meshes with deformable tetrahedral grids, and then train a diffusion model on this direct parameterization. We demonstrate the effectiveness of our model on multiple generative tasks.

2023-02-01

ICLR.cc/2023/Conference (notable)

doi.org

openreview.net

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

Oscar Mañas

Pau Rodriguez

Saba Ahmadi

Aida Nematzadeh

Yash Goyal

Aishwarya Agrawal

Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We p… (see more)ropose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation spaces of unimodal models using aligned image-text data, and can generalize to unseen VL tasks from just a few in-context examples. The small number of trainable parameters makes MAPL effective at low-data and in-domain learning. Moreover, MAPL’s modularity enables easy extension to other pre-trained models. Extensive experiments on several visual question answering and image captioning benchmarks show that MAPL achieves superior or competitive performance compared to similar methods while training orders of magnitude fewer parameters. MAPL can be trained in just a few hours using modest computational resources and public datasets. We release our code and pre-trained model weights at https://github.com/oscmansan/mapl.

2023-05-01

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (published)

doi.org

arxiv.org

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Vikram Voleti

Alexia Jolicoeur-Martineau

Chris Pal

Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor … (see more)and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically: future/past prediction -- when only future/past frames are masked; unconditional generation -- when both past and future frames are masked; and interpolation -- when neither past nor future frames are masked. Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our MCVD models are built from simple non-recurrent 2D-convolutional architectures, conditioning on blocks of frames and generating blocks of frames. We generate videos of arbitrary lengths autoregressively in a block-wise manner. Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using

openreview.net

See more publications