Publications

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

Vaibhav Singh

Paul Janson

Paria Mehrbod

Adam Ibrahim

Benjamin Thérien

The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. Whi… (see more)le self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful representations from vast amounts of unlabeled data, existing methods still struggle to adapt to the non-stationary, non-IID nature of real-world data streams without forgetting previously learned knowledge. Recent works have adopted a repeated cosine annealing schedule for large-scale continual pre-training; however, these schedules (1) inherently cause forgetting during the re-warming phase and (2) have not been systematically compared to existing continual SSL methods. In this work, we systematically compare the widely used cosine schedule with the recently proposed infinite learning rate schedule and empirically find the latter to be a more effective alternative. Our extensive empirical evaluation across diverse image and language datasets demonstrates that the infinite learning rate schedule consistently enhances continual pre-training performance compared to a repeated cosine decay without being restricted to a fixed iteration budget. For instance, in a small-scale MAE pre-training setup, it outperforms several strong baselines from the literature. We then scale up our experiments to larger MAE pre-training and autoregressive language model pre-training. Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks.

2025-03-01

arXiv (published)

doi.org

arxiv.org

DialEgg: Dialect-Agnostic MLIR Optimizer using Equality Saturation with Egglog

Abd-El-Aziz Zayed

Christophe Dubach

MLIR’s ability to optimize programs at multiple levels of abstraction is key to enabling domain-specific optimizing compilers. However, ex… (see more)pressing optimizations remains tedious. Optimizations can interact in unexpected ways, making it hard to unleash full performance. Equality saturation promises to solve these challenges. First, it simplifies the expression of optimizations using rewrite rules. Secondly, it considers all possible optimization interactions, through saturation, selecting the best program variant. Despite these advantages, equality saturation remains absent from production compilers such as MLIR. This paper proposes to integrate Egglog, a recent equality saturation engine, with MLIR, in a dialect-agnostic manner. This paper shows how the main MLIR constructs such as operations, types or attributes can be modeled in Egglog. It also presents DialEgg, a tool that pre-defines a large set of common MLIR constructs in Egglog and automatically translates between the MLIR and Egglog program representations. This paper uses a few use cases to demonstrate the potential for combining equality saturation and MLIR.

2025-03-01

IEEE/ACM International Symposium on Code Generation and Optimization (published)

doi.org

Ensemble machine learning to accelerate industrial decarbonization: Prediction of Hansen solubility parameters for streamlined chemical solvent selection

Eslam G. Al-Sakkari

Ahmed Ragab

Mostafa Amer

Olumoye Ajao

Marzouk Benali

Daria C. Boffito

Hanane Dagdougui

Mouloud Amazouz

2025-03-01

Digital Chemical Engineering (published)

doi.org

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts

Marta Skreta

Tara Akhound-Sadegh

Viktor Ohanesian

Roberto Bondesan

Alán Aspuru-Guzik

Arnaud Doucet

Rob Brekelmans

Alexander Tong

Kirill Neklyudov

While score-based generative models are the model of choice across diverse domains, there are limited tools available for controlling infere… (see more)nce-time behavior in a principled manner, e.g. for composing multiple pretrained models. Existing classifier-free guidance methods use a simple heuristic to mix conditional and unconditional scores to approximately sample from conditional distributions. However, such methods do not approximate the intermediate distributions, necessitating additional 'corrector' steps. In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. We derive a weighted simulation scheme which we call Feynman-Kac Correctors (FKCs) based on the celebrated Feynman-Kac formula by carefully accounting for terms in the appropriate partial differential equations (PDEs). To simulate these PDEs, we propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality. We empirically demonstrate the utility of our methods by proposing amortized sampling via inference-time temperature annealing, improving multi-objective molecule generation using pretrained models, and improving classifier-free guidance for text-to-image generation. Our code is available at https://github.com/martaskrt/fkc-diffusion.

2025-03-01

arXiv (published)

doi.org

arxiv.org

Implicit Generative Modeling by Kernel Similarity Matching

Shubham Choudhary

Paul Masset

Demba Ba

2025-03-01

ArXiv (preprint)

arxiv.org

Implicit Generative Modeling by Kernel Similarity Matching

Shubham Choudhary

Paul Masset

Demba Ba

Understanding how the brain encodes stimuli has been a fundamental problem in computational neuroscience. Insights into this problem have le… (see more)d to the design and development of artificial neural networks that learn representations by incorporating brain-like learning abilities. Recently, learning representations by capturing similarity between input samples has been studied to tackle this problem. This approach, however, has thus far been used to only learn downstream features from an input and has not been studied in the context of a generative paradigm, where one can map the representations back to the input space, incorporating not only bottom-up interactions (stimuli to latent) but also learning features in a top-down manner (latent to stimuli). We investigate a kernel similarity matching framework for generative modeling. Starting with a modified sparse coding objective for learning representations proposed in prior work, we demonstrate that representation learning in this context is equivalent to maximizing similarity between the input kernel and a latent kernel. We show that an implicit generative model arises from learning the kernel structure in the latent space and show how the framework can be adapted to learn manifold structures, potentially providing insights as to how task representations can be encoded in the brain. To solve the objective, we propose a novel Alternate Direction Method of Multipliers (ADMM) based algorithm and discuss the interpretation of the optimization process. Finally, we discuss how this representation learning problem can lead towards a biologically plausible architecture to learn the model parameters that ties together representation learning using similarity matching (a bottom-up approach) with predictive coding (a top-down approach).

2025-03-01

ArXiv (preprint)

arxiv.org

Improving clustering quality evaluation in noisy Gaussian mixtures

Renato Cordeiro De Amorim

Vladimir Makarenkov

2025-03-01

ArXiv (preprint)

arxiv.org

Improving internal cluster quality evaluation in noisy Gaussian mixtures

Renato Cordeiro De Amorim

Vladimir Makarenkov

Clustering is a fundamental technique in machine learning and data analysis, widely used across various domains. Internal clustering validat… (see more)ion measures, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. In this paper, we introduce a Feature Importance Rescaling (FIR) method designed to enhance internal clustering validation by adjusting feature contributions based on their dispersion. Our method systematically attenuates noise features making clustering compactness and separation clearer, and by consequence aligning internal validation measures more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations, we demonstrate that FIR consistently improves the correlation between internal validation indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement for internal clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is not available.

2025-03-01

ArXiv (preprint)

arxiv.org

Improving internal cluster quality evaluation in noisy Gaussian mixtures

Renato Cordeiro De Amorim

Vladimir Makarenkov

Clustering is a fundamental technique in machine learning and data analysis, widely used across various domains. Internal clustering validat… (see more)ion measures, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. In this paper, we introduce a Feature Importance Rescaling (FIR) method designed to enhance internal clustering validation by adjusting feature contributions based on their dispersion. Our method systematically attenuates noise features making clustering compactness and separation clearer, and by consequence aligning internal validation measures more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations, we demonstrate that FIR consistently improves the correlation between internal validation indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement for internal clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is not available.

2025-03-01

ArXiv (preprint)

arxiv.org

Interpretable deep learning for deconvolutional analysis of neural signals

Bahareh Tolooshams

Sara Matias

Hao Wu

Simona Temereanca

Naoshige Uchida

Venkatesh N. Murthy

Paul Masset

Demba Ba

2025-03-01

Neuron (published)

doi.org

Interval Regression: A Comparative Study with Proposed Models

Tung L. Nguyen

Toby Dylan Hocking

Regression models are essential for a wide range of real-world applications. However, in practice, target values are not always precisely kn… (see more)own; instead, they may be represented as intervals of acceptable values. This challenge has led to the development of Interval Regression models. In this study, we provide a comprehensive review of existing Interval Regression models and introduce alternative models for comparative analysis. Experiments are conducted on both real-world and synthetic datasets to offer a broad perspective on model performance. The results demonstrate that no single model is universally optimal, highlighting the importance of selecting the most suitable model for each specific scenario.

2025-03-01

arXiv (published)

doi.org

arxiv.org

Large language models deconstruct the clinical intuition behind diagnosing autism

Jack Stanley

Emmett Rabot

Siva Reddy

Eugene Belilovsky

L. Mottron

Danilo Bzdok

2025-03-01