Portrait de David Vázquez

David Vázquez

Membre industriel associé
Professeur associé, Polytechnique Montréal, Département d'informatique et de génie logiciel
ServiceNow
Sujets de recherche
Apprentissage de représentations
Apprentissage multimodal
Apprentissage profond
Grands modèles de langage (LLM)
IA conversationnelle
Modèles génératifs
Vision par ordinateur

Publications

A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images
Jorge Bernal
F. Javier Sánchez
Gloria Fernández-Esparrach
Antonio M. López
Adriana Romero
Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to… (voir plus) perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation.
PixelVAE: A Latent Variable Model for Natural Images
Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representatio… (voir plus)n and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64 × 64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.