Portrait de Dimitrios Sinodinos

Dimitrios Sinodinos

Doctorat - McGill
Superviseur⋅e principal⋅e
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Vision par ordinateur

Publications

Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets
Alex Koran
Takuya Nanri
Fangge Chen
High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by the low driving scores on the C… (voir plus)ARLA Leaderboard. Despite collision-related infractions being the dominant failure mode in closed-loop evaluations, collision-aware representation learning has received limited attention. To address this gap, we first develop a Video-Language-Augmented Anomaly Detector (VLAAD), leveraging a Multiple Instance Learning (MIL) formulation to obtain stable, temporally localized collision signals for proactive prediction. To transition these capabilities into closed-loop simulations, we must overcome the limitations of existing simulator datasets, which lack multimodality and are frequently restricted to simple intersection scenarios. Therefore, we introduce CARLA-Collide, a large-scale multimodal dataset capturing realistic collision events across highly diverse road networks. Trained on this diverse simulator data, VLAAD serves as a collision-aware plug-in module that can be seamlessly integrated into existing E2E driving models. By integrating our module into a pretrained TransFuser++ agent, we demonstrate a 14.12% relative increase in driving score with minimal fine-tuning. Beyond closed-loop evaluation, we further assess the generalization capability of VLAAD in an open-loop setting using real-world driving data. To support this analysis, we introduce Real-Collide, a multimodal dataset of diverse dashcam videos paired with semantically rich annotations for collision detection and prediction. On this benchmark, despite containing only 0.6B parameters, VLAAD outperforms a multi-billion-parameter vision-language model, achieving a 23.3% improvement in AUC.
Multitask-Informed Prior for In-Context Learning on Tabular Data: Application to Steel Property Prediction
Bahareh Nikpour
Jack Y. Wei
Sushant Sinha
Xiaoping Ma
Kashif Rehman
Stephen Yue
Accurate prediction of mechanical properties of steel during hot rolling processes, such as Thin Slab Direct Rolling (TSDR), remains challen… (voir plus)ging due to complex interactions among chemical compositions, processing parameters, and resultant microstructures. Traditional empirical and experimental methodologies, while effective, are often resource-intensive and lack adaptability to varied production conditions. Moreover, most existing approaches do not explicitly leverage the strong correlations among key mechanical properties, missing an opportunity to improve predictive accuracy through multitask learning. To address this, we present a multitask learning framework that injects multitask awareness into the prior of TabPFN--a transformer-based foundation model for in-context learning on tabular data--through novel fine-tuning strategies. Originally designed for single-target regression or classification, we augment TabPFN's prior with two complementary approaches: (i) target averaging, which provides a unified scalar signal compatible with TabPFN's single-target architecture, and (ii) task-specific adapters, which introduce task-specific supervision during fine-tuning. These strategies jointly guide the model toward a multitask-informed prior that captures cross-property relationships among key mechanical metrics. Extensive experiments on an industrial TSDR dataset demonstrate that our multitask adaptations outperform classical machine learning methods and recent state-of-the-art tabular learning models across multiple evaluation metrics. Notably, our approach enhances both predictive accuracy and computational efficiency compared to task-specific fine-tuning, demonstrating that multitask-aware prior adaptation enables foundation models for tabular data to deliver scalable, rapid, and reliable deployment for automated industrial quality control and process optimization in TSDR.
MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data
Tabular data is the most abundant data type in the world, powering systems in finance, healthcare, e-commerce, and beyond. As tabular datase… (voir plus)ts grow and span multiple related targets, there is an increasing need to exploit shared task information for improved multitask generalization. Multitask learning (MTL) has emerged as a powerful way to improve generalization and efficiency, yet most existing work focuses narrowly on large-scale recommendation systems, leaving its potential in broader tabular domains largely underexplored. Also, existing MTL approaches for tabular data predominantly rely on multi-layer perceptron-based backbones, which struggle to capture complex feature interactions and often fail to scale when data is abundant, a limitation that transformer architectures have overcome in other domains. Motivated by this, we introduce MultiTab-Net, the first multitask transformer architecture specifically designed for large tabular data. MultiTab-Net employs a novel multitask masked-attention mechanism that dynamically models feature-feature dependencies while mitigating task competition. Through extensive experiments, we show that MultiTab-Net consistently achieves higher multitask gain than existing MTL architectures and single-task transformers across diverse domains including large-scale recommendation data, census-like socioeconomic data, and physics datasets, spanning a wide range of task counts, task types, and feature modalities. In addition, we contribute MultiTab-Bench, a generalized multitask synthetic dataset generator that enables systematic evaluation of multitask dynamics by tuning task count, task correlations, and relative task complexity. Our code is publicly available at https://github.com/Armanfard-Lab/MultiTab.
EMA-Net: Efficient Multitask Affinity Learning for Dense Scene Predictions
Deep Reinforcement Learning in Human Activity Recognition: A Survey and Outlook.
Human activity recognition (HAR) is a popular research field in computer vision that has already been widely studied. However, it is still a… (voir plus)n active research field since it plays an important role in many current and emerging real-world intelligent systems, like visual surveillance and human-computer interaction. Deep reinforcement learning (DRL) has recently been used to address the activity recognition problem with various purposes, such as finding attention in video data or obtaining the best network structure. DRL-based HAR has only been around for a short time, and it is a challenging, novel field of study. Therefore, to facilitate further research in this area, we have constructed a comprehensive survey on activity recognition methods that incorporate DRL. Throughout the article, we classify these methods according to their shared objectives and delve into how they are ingeniously framed within the DRL framework. As we navigate through the survey, we conclude by shedding light on the prominent challenges and lingering questions that await the attention of future researchers, paving the way for further advancements and breakthroughs in this exciting domain.