Alihusein Kuwajerwala

alihusein.kuwajerwala@mila.quebec

PhD - Université de Montréal

Supervisor

Publications

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

Qiao Gu

Alihusein Kuwajerwala

Sacha Morin

Krishna Murthy

Bipasha Sen

Aditya Agarwal

Corban Rivera

William Paul

Kirsty Ellis

Rama Chellappa

Chuang Gan

Celso M de Melo

Joshua B. Tenenbaum

Antonio Torralba

Florian Shkurti

Liam Paull

For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and effi… (see more)cient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts. (Project page: https://concept-graphs.github.io/ Explainer video: https://youtu.be/mRhNkQwRYnc )

2023-10-20

robot-learning.org/CoRL/2023/Workshop/LangRob (poster)

doi.org

openreview.net

ConceptFusion: Open-set Multimodal 3D Mapping

Krishna Murthy

Alihusein Kuwajerwala

Qiao Gu

Mohd Omama

Tao Chen

Shuang Li

Alaa Maalouf

Ganesh Subramanian Iyer

Soroush Saryazdi

Nikhil Varma Keetha

Ayush Tewari

Joshua B. Tenenbaum

Celso M de Melo

Madhava Krishna

Liam Paull

Florian Shkurti

Antonio Torralba

Building 3D maps of the environment is central to robot navigation, planning, and interaction with objects in a scene. Most existing approac… (see more)hes that integrate semantic concepts with 3D maps largely remain confined to the closed-set setting: they can only reason about a finite set of concepts, pre-defined at training time. Further, these maps can only be queried using class labels, or in recent work, using text prompts. We address both these issues with ConceptFusion, a scene representation that is: (i) fundamentally open-set, enabling reasoning beyond a closed set of concepts (ii) inherently multi-modal, enabling a diverse range of possible queries to the 3D map, from language, to images, to audio, to 3D geometry, all working in concert. ConceptFusion leverages the open-set capabilities of today’s foundation models pre-trained on internet-scale data to reason about concepts across modalities such as natural language, images, and audio. We demonstrate that pixel-aligned open-set features can be fused into 3D maps via traditional SLAM and multi-view fusion approaches. This enables effective zero-shot spatial reasoning, not needing any additional training or finetuning, and retains long-tailed concepts better than supervised approaches, outperforming them by more than 40% margin on 3D IoU. We extensively evaluate ConceptFusion on a number of real-world datasets, simulated home environments, a real-world tabletop manipulation task, and an autonomous driving platform. We showcase new avenues for blending foundation models with 3D open-set multimodal mapping.

2023-05-06

ICRA.org/2023/Workshop/Pretraining4Robotics (published)

doi.org

openreview.net

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Alihusein Kuwajerwala

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Alihusein Kuwajerwala

Publications