Portrait de Amine Mhedhbi

Amine Mhedhbi

Membre académique associé
Professeur adjoint, Polytechnique Montréal, Département de génie informatique et génie logiciel
Sujets de recherche
Données tabulaires
Opérations d'apprentissage automatique (MLOps)
Recherche d'information
Science des données
Systèmes d'apprentissage automatique
Systèmes informatiques

Biographie

Amine Mhedhbi est professeur adjoint au Département de génie informatique et logiciel de Polytechnique Montréal, où il dirige le groupe Data and AI Systems (DAIS). Il est également membre académique associé à Mila – Institut québécois d’intelligence artificielle, et titulaire de la chaire FRQ-IVADO en ingénierie des données multimodales.

Ses intérêts de recherche couvrent tous les aspects de la gestion des données et de l’information, avec un accent particulier sur les architectures de systèmes de données analytiques et pilotés par l’IA. Ses travaux portent notamment sur les questions de performance, la débogabilité, la conception d’interfaces et les applications des données.

Amine a obtenu son doctorat à l’Université de Waterloo, où il a reçu le prix de la thèse distinguée en informatique et a été boursier doctoral de Microsoft.

Étudiants actuels

Doctorat - Polytechnique
Maîtrise recherche - Polytechnique
Maîtrise recherche - Polytechnique
Co-superviseur⋅e :
Doctorat - Polytechnique

Publications

Factorized and Vectorized Execution: Optimizing Analytical and Semantic Queries over Relations
Many-to-many joins are central to analytical and semantic workloads such as fraud detection, network analysis, and recommendation, where ins… (voir plus)ights arise from relationships between entities. These workloads often suffer from an explosion of intermediate results, sometimes orders of magnitude larger than the inputs. Factorized representations address this problem by exploiting conditional independence among attributes to encode intermediates more compactly. In some cases, they can reduce the output size asymptotically below the worst-case output size. However, adopting factorization in modern vectorized query processors remains challenging: factorized representations are hierarchical, whereas vectorized execution is built around flat, block-oriented processing. Prior approaches either rely on full materialization or support only restricted factorization layouts, sacrificing much of the benefits of both factorization and vectorization. We present FFX, a novel engine for F ast F actorized e X ecution. FFX is the first pipelined engine to support arbitrary factorization schemes while preserving full vectorization. The engine introduces packed factorized vectors and operators that maintain cache-friendly, contiguous layouts. Beyond analytics, FFX also co-optimizes semantic operators by serializing factorized intermediates into compact prompts for large language models (LLMs), substantially reducing token usage and inference cost while maintaining output quality and, in some cases, improving it. Together, these contributions enable efficient execution of join-heavy analytical queries, including queries augmented with semantic operators.
Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Priyan Vaithilingam
Daniel Lee
Elena L. Glassman
Towards Optimizing SQL Generation via LLM Routing
Mohammadhossein Malekpour
Text-to-SQL enables users to interact with databases through natural language, simplifying access to structured data. Although highly capabl… (voir plus)e large language models (LLMs) achieve strong accuracy for complex queries, they incur unnecessary latency and dollar cost for simpler ones. In this paper, we introduce the first LLM routing approach for Text-to-SQL, which dynamically selects the most cost-effective LLM capable of generating accurate SQL for each query. We present two routing strategies (score- and classification-based) that achieve accuracy comparable to the most capable LLM while reducing costs. We design the routers for ease of training and efficient inference. In our experiments, we highlight a practical and explainable accuracy-cost trade-off on the BIRD dataset.