Portrait of Michal Drozdzal is unavailable

Michal Drozdzal

Alumni

Publications

Controllable Image Generation via Collage Representations
Arantxa Casanova
Marlene Careil
Jakob Verbeek
Instance-Conditioned GAN Data Augmentation for Representation Learning
Pietro Astolfi
Arantxa Casanova
Jakob Verbeek
Learning to Substitute Ingredients in Recipes
Bahare Fatemi
Quentin Duval
Rohit Girdhar
Recipe personalization through ingredient substitution has the potential to help people meet their dietary needs and preferences, avoid pote… (see more)ntial allergens, and ease culinary exploration in everyone's kitchen. To address ingredient substitution, we build a benchmark, composed of a dataset of substitution pairs with standardized splits, evaluation metrics, and baselines. We further introduce Graph-based Ingredient Substitution Module (GISMo), a novel model that leverages the context of a recipe as well as generic ingredient relational information encoded within a graph to rank plausible substitutions. We show through comprehensive experimental validation that GISMo surpasses the best performing baseline by a large margin in terms of mean reciprocal rank. Finally, we highlight the benefits of GISMo by integrating it in an improved image-to-recipe generation pipeline, enabling recipe personalization through user intervention. Quantitative and qualitative results show the efficacy of our proposed system, paving the road towards truly personalized cooking and tasting experiences.
ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations
Badr Youbi Idrissi
Diane Bouchacourt
Randall Balestriero
Ivan Evtimov
Caner Hazirbas
David Lopez-Paz
Mark Ibrahim
Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fa… (see more)il to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X—a set of sixteen human annotations of factors such as pose, background, or lighting the entire ImageNet-1k validation set as well as a random subset of 12k training images. Equipped with ImageNet-X, we investigate 2,200 current recognition models and study the types of mistakes as a function of model’s (1) architecture, e.g. transformer vs. convolutional, (2) learning paradigm, e.g. supervised vs. self-supervised, and (3) training procedures, e.g., data augmentation. Regardless of these choices, we find models have consistent failure modes across ImageNet-X categories. We also find that while data augmentation can improve robustness to certain factors, they induce spill-over effects to other factors. For example, color-jitter augmentation improves robustness to color and brightness, but surprisingly hurts robustness to pose. Together, these insights suggest to advance the robustness of modern vision models, future research should focus on collecting additional data and understanding data augmentation schemes. Along with these insights, we release a toolkit based on ImageNet-X to spur further study into the mistakes image recognition systems make.
The Liver Tumor Segmentation Benchmark (LiTS)
Patrick Bilic
Patrick Christ
Hongwei Bran Li
Grzegorz Chlebus
Hao Chen
Qi Dou
Chi-Wing Fu
Xu Han
Gabriel Efrain Humpire Mamani
Pheng Ann Heng
Jürgen Hesser
Samuel Kadoury
Julian Walter Holch
Tomasz Konopczynski
Miao Yue
Chunming Li
X. Li
Jana Lipková
John Lowengrub … (see 99 more)
Michal Marianne Amitai
Hans Meine
J. Moltz
Marie Piraud
Ivan Ezhov
Xiaojuan Qi
Fernando Navarro
Jin Qi
Florian Kofler
Markus Rempfler
Johannes C. Paetzold
Suprosanna Shit
Andrea Schenk
Xiaobin Hu
Anjany Sekuboyina
Ping Zhou
Christian Hülsemeyer
Marcel Beetz
Jan Kirschke
Florian Ettlinger
Felix Gruen
Benedikt Wiestler
Zhiheng Zhang
Georgios Kaissis
Fabian Lohöfer
Rickmer Braren
J. Holch
Michela Antonelli
Felix Hofmann
Woong Bae
Wieland Sommer
Míriam Bellver
Volker Heinemann
Lei Bi
Colin Jacobs
G. Mamani
Bram van Ginneken
Erik B. Dam
Gabriel Chartrand
An Tang
Bogdan Georgescu
Avi Ben-Cohen
Xavier Giró-i-Nieto
Eyal Klang
M. Amitai
E. Konen
Hayit Greenspan
Johan Moreau
Jan Hendrik Moltz
Alexandre Hostettler
Christian Igel
Luc Soler
Fabian Isensee
Refael Vivanti
Paul Jäger
Adi Szeskin
Fucang Jia
Naama Lev-Cohain
Krishna Chaitanya Kaluva
Jacob Sosna
Mahendra Khened
Leo Joskowicz
Ildoo Kim
Bjoern Menze
Jae-Hun Kim
Zengming Shen
Sungwoong Kim
Simon Kohl
Avinash Kori
Ganapathy Krishnamurthi
Fan Li
Hongchao Li
Junbo Li
Xiaomeng Li
Jun Ma
Klaus Maier-Hein
Kevis-Kokitsi Maninis
Dorit Merhof
Akshay Pai
Mathias Perslev
Jens Petersen
Jordi Pont-Tuset
Oliver Rippel
Ignacio Sarasua
Jordi Torres
Christian Wachinger
Chunliang Wang
Leon Weninger
Jianrong Wu
Daguang Xu
Xiaoping Yang
Simon Chun-Ho Yu
Yading Yuan
Liping Zhang
Jorge Cardoso
Spyridon Bakas
Active 3D Shape Reconstruction from Vision and Touch
Edward J. Smith
Luis Pineda
Roberto Calandra
Jitendra Malik
Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch. However, in 3… (see more)D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings, leaving the active exploration of the shape largely unexplored. In active touch sensing for 3D reconstruction, the goal is to actively select the tactile readings that maximize the improvement in shape reconstruction accuracy. However, the development of deep learning-based active touch models is largely limited by the lack of frameworks for shape exploration. In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2) a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration. Our framework enables the development of the first fully data-driven solutions to active touch on top of learned models for object understanding. Our experiments show the benefits of such solutions in the task of 3D shape understanding where our models consistently outperform natural baselines. We provide our framework as a tool to foster future research in this direction.
3D Shape Reconstruction from Vision and Touch
Edward J. Smith
Roberto Calandra
Georgia Gkioxari
Jitendra Malik
When a toddler is presented a new toy, their instinctual behaviour is to pick it up and inspect it with their hand and eyes in tandem, clear… (see more)ly searching over its surface to properly understand what they are playing with. Here, touch provides high fidelity localized information while vision provides complementary global context. However, in 3D shape reconstruction, the complementary fusion of visual and haptic modalities remains largely unexplored. In this paper, we study this problem and present an effective chart-based approach to fusing vision and touch, which leverages advances in graph convolutional networks. To do so, we introduce a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects. Our results show that (1) leveraging both vision and touch signals consistently improves single-modality baselines; (2) our approach outperforms alternative modality fusion methods and strongly benefits from the proposed chart-based structure; (3) the reconstruction quality increases with the number of grasps provided; and (4) the touch information not only enhances the reconstruction at the touch site but also extrapolates to its local neighborhood.
On the Iterative Refinement of Densely Connected Representation Levels for Semantic Segmentation
Arantxa Casanova
Guillem Cucurull
State-of-the-art semantic segmentation approaches increase the receptive field of their models by using either a downsampling path composed … (see more)of poolings/strided convolutions or successive dilated convolutions. However, it is not clear which operation leads to best results. In this paper, we systematically study the differences introduced by distinct receptive field enlargement methods and their impact on the performance of a novel architecture, called Fully Convolutional DenseResNet (FC-DRN). FC-DRN has a densely connected backbone composed of residual networks. Following standard image segmentation architectures, receptive field enlargement operations that change the representation level are interleaved among residual networks. This allows the model to exploit the benefits of both residual and dense connectivity patterns, namely: gradient flow, iterative refinement of representations, multi-scale feature combination and deep supervision. In order to highlight the potential of our model, we test it on the challenging CamVid urban scene understanding benchmark and make the following observations: 1) downsampling operations outperform dilations when the model is trained from scratch, 2) dilations are useful during the finetuning step of the model, 3) coarser representations require less refinement steps, and 4) ResNets (by model construction) are good regularizers, since they can reduce the model capacity when needed. Finally, we compare our architecture to alternative methods and report state-of-the-art result on the Camvid dataset, with at least twice fewer parameters.
Convolutional neural networks for mesh-based parcellation of the cerebral cortex
Guillem Cucurull
Konrad Wagstyl
Arantxa Casanova
Estrid Jakobsen
Alan C. Evans
In order to understand the organization of the cerebral cortex, it is necessary to create a map or parcellation of cortical areas. Reconstru… (see more)ctions of the cortical surface created from structural MRI scans, are frequently used in neuroimaging as a common coordinate space for representing multimodal neuroimaging data. These meshes are used to investigate healthy brain organization as well as abnormalities in neurological and psychiatric conditions. We frame cerebral cortex parcellation as a mesh segmentation task, and address it by taking advantage of recent advances in generalizing convolutions to the graph domain. In particular, we propose to assess graph convolutional networks and graph attention networks, which, in contrast to previous mesh parcellation models, exploit the underlying structure of the data to make predictions. We show experimentally on the Human Connectome Project dataset that the proposed graph convolutional models outperform current state-of-the-art and baselines, highlighting the potential and applicability of these methods to tackle neuroimaging challenges, paving the road towards a better characterization of brain diseases.
Learnable Explicit Density for Continuous Latent Space and Variational Inference
In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its correspon… (see more)ding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.
A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images
Jorge Bernal
F. Javier Sánchez
Gloria Fernández-Esparrach
Antonio M. López
Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to… (see more) perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss rate and the inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing decision support systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image segmentation, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. The proposed dataset consists of 4 relevant classes to inspect the endoluminal scene, targeting different clinical needs. Together with the dataset and taking advantage of advances in semantic segmentation literature, we provide new baselines by training standard fully convolutional networks (FCNs). We perform a comparative study to show that FCNs significantly outperform, without any further postprocessing, prior results in endoluminal scene segmentation, especially with respect to polyp segmentation and localization.
A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images
Jorge Bernal
F. Javier Sánchez
Gloria Fernández-Esparrach
Antonio M. López
Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to… (see more) perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss rate and the inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing decision support systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image segmentation, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. The proposed dataset consists of 4 relevant classes to inspect the endoluminal scene, targeting different clinical needs. Together with the dataset and taking advantage of advances in semantic segmentation literature, we provide new baselines by training standard fully convolutional networks (FCNs). We perform a comparative study to show that FCNs significantly outperform, without any further postprocessing, prior results in endoluminal scene segmentation, especially with respect to polyp segmentation and localization.