Portrait de Marco Pedersoli

Marco Pedersoli

Membre affilié
Professeur associé, École de technologie suprérieure
Sujets de recherche
Apprentissage de représentations
Apprentissage multimodal
Apprentissage profond
Généralisation
Imagerie satellite
Modèles génératifs
Robustesse
Supervision faible
Systèmes de gestion de l'énergie des bâtiments
Vision et langage
Vision par ordinateur

Biographie

Je suis professeur associé à l'ÉTS Montréal, membre du LIVIA (le Laboratoire d'Imagerie, Vision et Intelligence Artificielle), et membre du Laboratoire International des Systèmes d'Apprentissage (ILLS). Je suis également membre d'ELLIS, le réseau européen d'excellence en IA. Depuis 2021, je suis co-titulaire de la chaire de recherche industrielle Distech sur les réseaux neuronaux intégrés pour le contrôle des bâtiments connectés.

Mes recherches sont centrées sur les méthodes et algorithmes de Deep Learning, avec un accent sur la reconnaissance visuelle, l'interprétation automatique et la compréhension des images et des vidéos. L'un des principaux objectifs de mon travail est de faire progresser l'intelligence artificielle en minimisant deux facteurs critiques : la charge de calcul et la nécessité d'une supervision humaine. Ces réductions sont essentielles pour une IA évolutive, permettant des systèmes plus efficaces, adaptatifs et intégrés. Dans mes travaux récents, j'ai contribué au développement de réseaux neuronaux pour les bâtiments intelligents, en intégrant des solutions basées sur l'IA pour améliorer l'efficacité énergétique et le confort dans les environnements intelligents.

Étudiants actuels

Maîtrise recherche - École de technologie suprérieure
Superviseur⋅e principal⋅e :

Publications

Multi-Source Domain Adaptation for Object Detection with Prototype-based Mean Teacher
Atif Belal
Akhil Meethal
Francisco Perdigon Romero
Eric Granger
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors
Atif Belal
Akhil Meethal
Francisco Perdigon Romero
Eric Granger
Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment acro… (voir plus)ss source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple annotated source datasets and unlabeled target data to improve the accuracy and robustness of the detection model. Most state-of-the-art MSDA methods for OD perform feature alignment in a class-agnostic manner. This is challenging since the objects have unique modality information due to variations in object appearance across domains. A recent prototype-based approach proposed a class-wise alignment, yet it suffers from error accumulation caused by noisy pseudo-labels that can negatively affect adaptation with imbalanced data. To overcome these limitations, we propose an attention-based class-conditioned alignment method for MSDA, designed to align instances of each object category across domains. In particular, an attention module combined with an adversarial domain classifier allows learning domain-invariant and class-specific instance representations. Experimental results on multiple benchmarking MSDA datasets indicate that our method outperforms state-of-the-art methods and exhibits robustness to class imbalance, achieved through a conceptually simple class-conditioning strategy. Our code is available at: https://github.com/imatif17/ACIA.
Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting
David Latortue
Moetez Kdayem
Fidel A. Guerrero Peña
Eric Granger
Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly boun… (voir plus)ding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision affect the performance of deep person counting architectures for image classification and point-level localization. Our experiments indicate that counting people using a convolutional neural network with image-level annotation achieves a level of accuracy that is competitive with YOLO detectors and point-level localization models yet provides a higher frame rate and a simi-lar amount of model parameters. Our code is available at: https://github.com/tortueTortue/IRPeopleCounting.
Joint Multimodal Transformer for Dimensional Emotional Recognition in the Wild
Paul Waligora
Muhammad Osama Zeeshan
Muhammad Haseeb Aslam
Soufiane Belharbi
Alessandro Lameiras Koerich
Simon Bacon
Eric Granger
Audiovisual emotion recognition (ER) in videos has immense potential over unimodal performance. It effectively leverages the inter-and intra… (voir plus)-modal dependencies between visual and auditory modalities. This work proposes a novel audio-visual emotion recognition system utilizing a joint multimodal transformer architecture with key-based cross-attention. This framework aims to exploit the complementary nature of audio and visual cues (facial expressions and vocal patterns) in videos, leading to superior performance compared to solely relying on a single modality. The proposed model leverages separate backbones for capturing intra-modal temporal dependencies within each modality (audio and visual). Subse-quently, a joint multimodal transformer architecture integrates the individual modality embeddings, enabling the model to effectively capture inter-modal (between audio and visual) and intra-modal (within each modality) relationships. Extensive evaluations on the challenging Affwild2 dataset demonstrate that the proposed model significantly outperforms baseline and state-of-the-art methods in ER tasks.
Do not trust what you trust: Miscalibration in Semi-supervised Learning
Shambhavi Mishra
Balamurali Murugesan
Ismail Ben Ayed
Jose Dolz
State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the tra… (voir plus)ining on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing and enhancing the uncertainty of network predictions is of paramount importance in the pseudo-labeling process. In this work, we empirically demonstrate that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy, a lower bound of the Shannon entropy, as a potential cause for miscalibration. To alleviate this issue, we integrate a simple penalty term, which enforces the logit distances of the predictions on unlabeled samples to remain low, preventing the network predictions to become overconfident. Comprehensive experiments on a variety of SSL image classification benchmarks demonstrate that the proposed solution systematically improves the calibration performance of relevant SSL models, while also enhancing their discriminative power, being an appealing addition to tackle SSL tasks.
DiPS: Discriminative Pseudo-Label Sampling with Self-Supervised Transformers for Weakly Supervised Object Localization
Shakeeb Murtaza
Soufiane Belharbi
Aydin Sarraf
Eric Granger