Portrait of David Vázquez

David Vázquez

Associate Industry Member
Adjunct Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineerin
ServiceNow
Research Topics
Computer Vision
Conversational AI
Deep Learning
Generative Models
Large Language Models (LLM)
Multimodal Learning
Representation Learning

Publications

A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images
Jorge Bernal
F. Javier Sánchez
Gloria Fernández-Esparrach
Antonio M. López
Adriana Romero
Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to… (see more) perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation.
PixelVAE: A Latent Variable Model for Natural Images
Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representatio… (see more)n and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64 × 64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.