Felipe Codevilla

Scaling Self-Supervised End-to-End Driving with Multi-View Attention Learning

Yi Xiao

Felipe Codevilla

Diego Porres

Antonio M. López

2022-12-31

arXiv.org (preprint)

doi.org

Latent Variable Sequential Set Transformers for Joint Multi-agent Motion Prediction

Jim Aldon D'Souza

Samira E. Kahou

Felix Heide

Christopher Pal

Robust multi-agent trajectory prediction is essential for the safe control of robotic systems. A major challenge is to efficiently learn a r… (see more)epresentation that approximates the true joint distribution of contextual, social, and temporal information to enable planning. We propose Latent Variable Sequential Set Transformers which are encoder-decoder architectures that generate scene-consistent multi-agent trajectories. We refer to these architectures as "AutoBots". The encoder is a stack of interleaved temporal and social multi-head self-attention (MHSA) modules which alternately perform equivariant processing across the temporal and social dimensions. The decoder employs learnable seed parameters in combination with temporal and social MHSA modules allowing it to perform inference over the entire future scene in a single forward pass efficiently. AutoBots can produce either the trajectory of one ego-agent or a distribution over the future trajectories for all agents in the scene. For the single-agent prediction case, our model achieves top results on the global nuScenes vehicle motion prediction leaderboard, and produces strong results on the Argoverse vehicle prediction challenge. In the multi-agent setting, we evaluate on the synthetic partition of TrajNet++ dataset to showcase the model's socially-consistent predictions. We also demonstrate our model on general sequences of sets and provide illustrative experiments modelling the sequential structure of the multiple strokes that make up symbols in the Omniglot data. A distinguishing feature of AutoBots is that all models are trainable on a single desktop GPU (1080 Ti) in under 48h.

2021-12-31

ICLR (published)

doi.org

openreview.net

Learned Image Compression for Machine Perception

Felipe Codevilla

Jean Gabriel Simard

Ross Goroshin

Chris Pal

Recent work has shown that learned image compression strategies can outperform standard hand-crafted compression algorithms that have been d… (see more)eveloped over decades of intensive research on the rate-distortion trade-off. With growing applications of computer vision, high quality image reconstruction from a compressible representation is often a secondary objective. Compression that ensures high accuracy on computer vision tasks such as image segmentation, classification, and detection therefore has the potential for significant impact across a wide variety of settings. In this work, we develop a framework that produces a compression format suitable for both human perception and machine perception. We show that representations can be learned that simultaneously optimize for compression and performance on core vision tasks. Our approach allows models to be trained directly from compressed representations, and this approach yields increased performance on new tasks and in low-shot learning settings. We present results that improve upon segmentation and detection performance compared to standard high quality JPGs, but with representations that are four to ten times smaller in terms of bits per pixel. Further, unlike naive compression methods, at a level ten times smaller than standard JEPGs, segmentation and detection models trained from our format suffer only minor degradation in performance.

2021-11-02

ArXiv (preprint)

doi.org

arxiv.org

Action-Based Representation Learning for Autonomous Driving

Yi Xiao

Felipe Codevilla

Christopher Pal

Antonio M. López

Human drivers produce a vast amount of data which could, in principle, be used to improve autonomous driving systems. Unfortunately, seeming… (see more)ly straightforward approaches for creating end-to-end driving models that map sensor data directly into driving actions are problematic in terms of interpretability, and typically have significant difficulty dealing with spurious correlations. Alternatively, we propose to use this kind of action-based driving data for learning representations. Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery and outperform pure end-to-end driving models, while being more interpretable. Further, we demonstrate how this strategy outperforms previous methods based on learning inverse dynamics models as well as other methods based on heavy human supervision (ImageNet).

2021-10-03

Proceedings of the 2020 Conference on Robot Learning (published)

proceedings.mlr.press

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Felipe Codevilla

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Felipe Codevilla

Publications