Improving internal cluster quality evaluation in noisy Gaussian mixtures
Renato Cordeiro De Amorim
Clustering is a fundamental technique in machine learning and data analysis, widely used across various domains. Internal clustering validat… (see more)ion measures, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. In this paper, we introduce a Feature Importance Rescaling (FIR) method designed to enhance internal clustering validation by adjusting feature contributions based on their dispersion. Our method systematically attenuates noise features making clustering compactness and separation clearer, and by consequence aligning internal validation measures more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations, we demonstrate that FIR consistently improves the correlation between internal validation indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement for internal clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is not available.
Improving internal cluster quality evaluation in noisy Gaussian mixtures
Renato Cordeiro De Amorim
Clustering is a fundamental technique in machine learning and data analysis, widely used across various domains. Internal clustering validat… (see more)ion measures, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. In this paper, we introduce a Feature Importance Rescaling (FIR) method designed to enhance internal clustering validation by adjusting feature contributions based on their dispersion. Our method systematically attenuates noise features making clustering compactness and separation clearer, and by consequence aligning internal validation measures more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations, we demonstrate that FIR consistently improves the correlation between internal validation indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement for internal clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is not available.
Interpretable deep learning for deconvolutional analysis of neural signals
Bahareh Tolooshams
Sara Matias
Hao Wu
Simona Temereanca
Naoshige Uchida
Venkatesh N. Murthy
Demba Ba
Interval Regression: A Comparative Study with Proposed Models
Tung L. Nguyen
Regression models are essential for a wide range of real-world applications. However, in practice, target values are not always precisely kn… (see more)own; instead, they may be represented as intervals of acceptable values. This challenge has led to the development of Interval Regression models. In this study, we provide a comprehensive review of existing Interval Regression models and introduce alternative models for comparative analysis. Experiments are conducted on both real-world and synthetic datasets to offer a broad perspective on model performance. The results demonstrate that no single model is universally optimal, highlighting the importance of selecting the most suitable model for each specific scenario.
Large language models deconstruct the clinical intuition behind diagnosing autism
Jack Stanley
Emmett Rabot
L. Mottron
LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
Rashid A. Mushkani
Shravan Nayak
Hugo Berard
Allison Cohen
Hadrien Bertrand
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Leo Schwinn
Stephan Günnemann
In this paper, we argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of… (see more) noise, such as small datasets, methodological inconsistencies, and unreliable evaluation setups. This can, at times, make it impossible to evaluate and compare attacks and defenses fairly, thereby slowing progress. We systematically analyze the LLM safety evaluation pipeline, covering dataset curation, optimization strategies for automated red-teaming, response generation, and response evaluation using LLM judges. At each stage, we identify key issues and highlight their practical impact. We also propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers. Lastly, we offer an opposing perspective, highlighting practical reasons for existing limitations. We believe that addressing the outlined problems in future research will improve the field's ability to generate easily comparable results and make measurable progress.
Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes -- Insights from Urban Studies
Rashid A. Mushkani
Hugo Berard
Normalizing Spinal Cord Compression Measures in Degenerative Cervical Myelopathy.
Sandrine Bédard
Jan Valošek
Maryam Seif
Armin Curt
Simon Schading-Sassenhausen
Nikolai Pfender
P. Freund
Markus Hupp
PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion
Amar Kumar
Anita Kriz
Mohammad Havaei
Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, da… (see more)ta imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures robust to the unique complexities posed by medical imaging data. The rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.
RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models
Parham Saremi
Amar Kumar
Mohammed Mohammed
Zahra Tehraninasab
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur
Nicholas Meade
Xing Han Lu
Alejandra Zambrano
Arkil Patel
Esin Durmus
Spandana Gella
Karolina Stanczak