ConceptGraphs

Combining vision and language to help robots navigate the world.

Background

To operate effectively in complex environments, robots need to build 3D representations of their surroundings that can be used for task planning and execution. This is the so-called scene understanding problem, which combines various fields such as computer vision, natural language processing and 3D modelling.

Existing approaches generally categorize objects using a fixed set of semantic labels, which is often insufficient for complex tasks. However, advances in multimodal foundation models now make it possible to develop more flexible "open vocabulary" solutions that address these limitations.

Objectives

ConceptGraphs is a step towards robots performing tasks directly from natural language instructions. It is a mapping system that integrates the geometric information of traditional 3D mapping approaches with the rich semantic information of vision language foundation models.

From raw sensor data, ConceptGraphs builds a 3D scene-graph of objects and their relationships, where semantic features are not restricted to a predefined semantic class label. This enables robots to perform complex navigation and object manipulation tasks, as demonstrated in a series of real-world experiments.

About the Project

ConceptGraphs is a mapping system that uses foundation models to build open-vocabulary 3D scene-graphs.

The input is a "scan" of the scene, in particular an RGB video, with information on depth and camera pose, and the output is an incrementally constructed 3D graph structure. Each node is an object, and the edges represent the relationships between objects, for example a cup sitting "on top of" a table.

For each object, large visual-language models are used to extract vector embeddings and textual legends, rather than simple semantic class labels as in previous work. The geometry and visual appearance of each object is also stored in the form of an RGB point-cloud. The result is a complete 3D map of the scene, on which a user can easily search for objects using natural language queries such as "a plush toy" or "red sneakers". This provides robots with a wide range of perceptual and task-planning capabilities.

Research institutions

ConceptGraphs involves a significant collaboration among 8 research institutions with a total of 16 authors.

Object identification and navigation tasks

Given just a natural language description, ConceptGraphs was used to make a wheeled robot identify, locate, and navigate to 30 different objects in a cluttered environment at the Montreal Robotics Lab (REAL).

71+88

Accuracy of objects and edges

Measured percentage accuracy of the nodes and edges of the constructed scene-graph by a human expert annotator from Amazon Mechanical Turk, evaluated on the Replica 3D Dataset from Meta.

ConceptGraphs lets us leverage the power of large vision language models for robot world representations. This enables robots to perform some pretty impressively abstract tasks right out of the box.

Liam Paull, Assistant Professor, Université de Montréal, Core Academic Member, Mila

Resources

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

Read the paper https://arxiv.org/abs/2309.16650

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

ConceptGraphs Project Website

Learn more https://concept-graphs.github.io/

ConceptGraphs on GitHub

Official code release for ConceptGraphs

Learn more https://github.com/concept-graphs/concept-graphs

Meet the Team

Mila Members

Core Academic Member

Liam Paull

Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research

Canada CIFAR AI Chair

View profile

Kirsty Ellis

Developer, Research Software, Innovation, Development and Technologies

View profile

Sacha Morin

PhD - Université de Montréal

View profile

Other Members

Aditya Agarwal (Mila)

Bipasha Sen (Mila)

Joshua B. Tenenbaum

Rama Chellappa

Chuang Gan

Qiao Gu

Celso Miguel de Melo

Krishna Murthy Jatavallabhula

William Paul

Corban Rivera

Florian Shkurti

Antonio Torralba

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

ConceptGraphs

Background

Objectives

About the Project

ConceptGraphs is a mapping system that uses foundation models to build open-vocabulary 3D scene-graphs.

Research institutions

Object identification and navigation tasks

Accuracy of objects and edges

Resources

Meet the Team

Partners

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

ConceptGraphs

Background

Objectives

About the Project

ConceptGraphs is a mapping system that uses foundation models to build open-vocabulary 3D scene-graphs.

Research institutions

Object identification and navigation tasks

Accuracy of objects and edges

Resources

Meet the Team

Partners