Publications

DialEgg: Dialect-Agnostic MLIR Optimizer using Equality Saturation with Egglog

Abd-El-Aziz Zayed

MLIR’s ability to optimize programs at multiple levels of abstraction is key to enabling domain-specific optimizing compilers. However, ex… (voir plus)pressing optimizations remains tedious. Optimizations can interact in unexpected ways, making it hard to unleash full performance. Equality saturation promises to solve these challenges. First, it simplifies the expression of optimizations using rewrite rules. Secondly, it considers all possible optimization interactions, through saturation, selecting the best program variant. Despite these advantages, equality saturation remains absent from production compilers such as MLIR. This paper proposes to integrate Egglog, a recent equality saturation engine, with MLIR, in a dialect-agnostic manner. This paper shows how the main MLIR constructs such as operations, types or attributes can be modeled in Egglog. It also presents DialEgg, a tool that pre-defines a large set of common MLIR constructs in Egglog and automatically translates between the MLIR and Egglog program representations. This paper uses a few use cases to demonstrate the potential for combining equality saturation and MLIR.

2025-03-01

IEEE/ACM International Symposium on Code Generation and Optimization (publié)

doi.org

Ensemble machine learning to accelerate industrial decarbonization: Prediction of Hansen solubility parameters for streamlined chemical solvent selection

Eslam G. Al-Sakkari

Ahmed Ragab

Mostafa Amer

Olumoye Ajao

Marzouk Benali

Daria C. Boffito

Hanane Dagdougui

Mouloud Amazouz

2025-03-01

Digital Chemical Engineering (publié)

doi.org

Implicit Generative Modeling by Kernel Similarity Matching

Shubham Choudhary

Paul Masset

Demba Ba

2025-03-01

ArXiv (prépublication)

arxiv.org

Implicit Generative Modeling by Kernel Similarity Matching

Shubham Choudhary

Paul Masset

Demba Ba

Understanding how the brain encodes stimuli has been a fundamental problem in computational neuroscience. Insights into this problem have le… (voir plus)d to the design and development of artificial neural networks that learn representations by incorporating brain-like learning abilities. Recently, learning representations by capturing similarity between input samples has been studied to tackle this problem. This approach, however, has thus far been used to only learn downstream features from an input and has not been studied in the context of a generative paradigm, where one can map the representations back to the input space, incorporating not only bottom-up interactions (stimuli to latent) but also learning features in a top-down manner (latent to stimuli). We investigate a kernel similarity matching framework for generative modeling. Starting with a modified sparse coding objective for learning representations proposed in prior work, we demonstrate that representation learning in this context is equivalent to maximizing similarity between the input kernel and a latent kernel. We show that an implicit generative model arises from learning the kernel structure in the latent space and show how the framework can be adapted to learn manifold structures, potentially providing insights as to how task representations can be encoded in the brain. To solve the objective, we propose a novel Alternate Direction Method of Multipliers (ADMM) based algorithm and discuss the interpretation of the optimization process. Finally, we discuss how this representation learning problem can lead towards a biologically plausible architecture to learn the model parameters that ties together representation learning using similarity matching (a bottom-up approach) with predictive coding (a top-down approach).

2025-03-01

ArXiv (prépublication)

arxiv.org

Improving clustering quality evaluation in noisy Gaussian mixtures

Renato Cordeiro De Amorim

Vladimir Makarenkov

2025-03-01

arXiv (publié)

doi.org

arxiv.org

Improving clustering quality evaluation in noisy Gaussian mixtures

Renato Cordeiro De Amorim

Vladimir Makarenkov

2025-03-01

ArXiv (prépublication)

arxiv.org

Improving internal cluster quality evaluation in noisy Gaussian mixtures

Renato Cordeiro De Amorim

Vladimir Makarenkov

Clustering is a fundamental technique in machine learning and data analysis, widely used across various domains. Internal clustering validat… (voir plus)ion measures, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. In this paper, we introduce a Feature Importance Rescaling (FIR) method designed to enhance internal clustering validation by adjusting feature contributions based on their dispersion. Our method systematically attenuates noise features making clustering compactness and separation clearer, and by consequence aligning internal validation measures more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations, we demonstrate that FIR consistently improves the correlation between internal validation indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement for internal clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is not available.

2025-03-01

ArXiv (prépublication)

arxiv.org

Improving internal cluster quality evaluation in noisy Gaussian mixtures

Renato Cordeiro De Amorim

Vladimir Makarenkov

Clustering is a fundamental technique in machine learning and data analysis, widely used across various domains. Internal clustering validat… (voir plus)ion measures, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. In this paper, we introduce a Feature Importance Rescaling (FIR) method designed to enhance internal clustering validation by adjusting feature contributions based on their dispersion. Our method systematically attenuates noise features making clustering compactness and separation clearer, and by consequence aligning internal validation measures more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations, we demonstrate that FIR consistently improves the correlation between internal validation indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement for internal clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is not available.

2025-03-01

ArXiv (prépublication)

arxiv.org

An interpretable and reliable framework for alloy discovery in thermomechanical processing

Sushant Sinha

Xiaopin Ma

Kashif Rehman

Narges Armanfard

Sheng‐Jie Yue

2025-03-01

Materials Today Communications (publié)

doi.org

Interpretable deep learning for deconvolutional analysis of neural signals

Bahareh Tolooshams

Sara Matias

Hao Wu

Simona Temereanca

Naoshige Uchida

Venkatesh N. Murthy

Paul Masset

Demba Ba

2025-03-01

Neuron (publié)

doi.org

Interval Regression: A Comparative Study with Proposed Models

Tung L. Nguyen

Toby Dylan Hocking

Regression models are essential for a wide range of real-world applications. However, in practice, target values are not always precisely kn… (voir plus)own; instead, they may be represented as intervals of acceptable values. This challenge has led to the development of Interval Regression models. In this study, we provide a comprehensive review of existing Interval Regression models and introduce alternative models for comparative analysis. Experiments are conducted on both real-world and synthetic datasets to offer a broad perspective on model performance. The results demonstrate that no single model is universally optimal, highlighting the importance of selecting the most suitable model for each specific scenario.

2025-03-01

arXiv (publié)

doi.org

arxiv.org

Large language models deconstruct the clinical intuition behind diagnosing autism

Emmett Rabot

L. Mottron

2025-03-01