Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
DialEgg: Dialect-Agnostic MLIR Optimizer using Equality Saturation with Egglog
MLIR’s ability to optimize programs at multiple levels of abstraction is key to enabling domain-specific optimizing compilers. However, ex… (voir plus)pressing optimizations remains tedious. Optimizations can interact in unexpected ways, making it hard to unleash full performance. Equality saturation promises to solve these challenges. First, it simplifies the expression of optimizations using rewrite rules. Secondly, it considers all possible optimization interactions, through saturation, selecting the best program variant. Despite these advantages, equality saturation remains absent from production compilers such as MLIR. This paper proposes to integrate Egglog, a recent equality saturation engine, with MLIR, in a dialect-agnostic manner. This paper shows how the main MLIR constructs such as operations, types or attributes can be modeled in Egglog. It also presents DialEgg, a tool that pre-defines a large set of common MLIR constructs in Egglog and automatically translates between the MLIR and Egglog program representations. This paper uses a few use cases to demonstrate the potential for combining equality saturation and MLIR.
2025-03-01
IEEE/ACM International Symposium on Code Generation and Optimization (publié)
Ensemble machine learning to accelerate industrial decarbonization: Prediction of Hansen solubility parameters for streamlined chemical solvent selection
Understanding how the brain encodes stimuli has been a fundamental problem in computational neuroscience. Insights into this problem have le… (voir plus)d to the design and development of artificial neural networks that learn representations by incorporating brain-like learning abilities. Recently, learning representations by capturing similarity between input samples has been studied to tackle this problem. This approach, however, has thus far been used to only learn downstream features from an input and has not been studied in the context of a generative paradigm, where one can map the representations back to the input space, incorporating not only bottom-up interactions (stimuli to latent) but also learning features in a top-down manner (latent to stimuli). We investigate a kernel similarity matching framework for generative modeling. Starting with a modified sparse coding objective for learning representations proposed in prior work, we demonstrate that representation learning in this context is equivalent to maximizing similarity between the input kernel and a latent kernel. We show that an implicit generative model arises from learning the kernel structure in the latent space and show how the framework can be adapted to learn manifold structures, potentially providing insights as to how task representations can be encoded in the brain. To solve the objective, we propose a novel Alternate Direction Method of Multipliers (ADMM) based algorithm and discuss the interpretation of the optimization process. Finally, we discuss how this representation learning problem can lead towards a biologically plausible architecture to learn the model parameters that ties together representation learning using similarity matching (a bottom-up approach) with predictive coding (a top-down approach).
Clustering is a fundamental technique in machine learning and data analysis, widely used across various domains. Internal clustering validat… (voir plus)ion measures, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. In this paper, we introduce a Feature Importance Rescaling (FIR) method designed to enhance internal clustering validation by adjusting feature contributions based on their dispersion. Our method systematically attenuates noise features making clustering compactness and separation clearer, and by consequence aligning internal validation measures more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations, we demonstrate that FIR consistently improves the correlation between internal validation indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement for internal clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is not available.
Clustering is a fundamental technique in machine learning and data analysis, widely used across various domains. Internal clustering validat… (voir plus)ion measures, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. In this paper, we introduce a Feature Importance Rescaling (FIR) method designed to enhance internal clustering validation by adjusting feature contributions based on their dispersion. Our method systematically attenuates noise features making clustering compactness and separation clearer, and by consequence aligning internal validation measures more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations, we demonstrate that FIR consistently improves the correlation between internal validation indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement for internal clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is not available.
Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, da… (voir plus)ta imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures robust to the unique complexities posed by medical imaging data. The rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.
A key challenge in AI alignment is guiding large language models (LLMs) to follow desired behaviors at test time. Activation steering, which… (voir plus) modifies internal model activations during inference, offers a potential solution. However, prior work in dense activation spaces struggles with superposition, wherein multiple features become entangled, limiting interpretability and precise control. In contrast, sparse representations provide an untapped opportunity for more interpretable behavior modulation. In this work, we introduce sparse activation steering (SAS), a method that leverages sparse autoencoders (SAEs) to steer LLM behavior in sparse spaces. By isolating behavior-specific features through a contrastive prompt-pairing approach, we define a set of features that can selectively reinforce or suppress behaviors. Experiments on Gemma 2 LLMs show that SAS vectors enable nuanced behavioral modulation and finer-grained control. Furthermore, scaling SAEs improves monosemanticity of SAS vectors, suggesting more reliable and interpretable interventions.