Publications

Higher Order Transformers: Efficient Attention Mechanism for Tensor Structured Data

Transformers are now ubiquitous for sequence modeling tasks, but their extension to multi-dimensional data remains a challenge due to the qu… (voir plus)adratic cost of the attention mechanism. In this paper, we propose Higher-Order Transformers (HOT), a novel architecture designed to efficiently process data with more than two axes, i.e. higher-order tensors. To address the computational challenges associated with high-order tensor attention, we introduce a novel Kronecker factorized attention mechanism that reduces the attention cost to quadratic in each axis' dimension, rather than quadratic in the total size of the input tensor. To further enhance efficiency, HOT leverages kernelized attention, reducing the complexity to linear. This strategy maintains the model's expressiveness while enabling scalable attention computation. We validate the effectiveness of HOT on two high-dimensional tasks, including multivariate time series forecasting, and 3D medical image classification. Experimental results demonstrate that HOT achieves competitive performance while significantly improving computational efficiency, showcasing its potential for tackling a wide range of complex, multi-dimensional data.

2025-11-15

TMLR (accepté)

doi.org

openreview.net

Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection

Xi Xiao

Zhuxuanzi Wang

Mingqiao Mo

Chen Liu

Chenrui Ma

Yanshu Li

Smita Krishnaswamy

Xiao Wang

Tianyang Wang

The deployment of automated pavement defect detection is often hindered by poor cross-domain generalization. Supervised detectors achieve st… (voir plus)rong in-domain accuracy but require costly re-annotation for new environments, while standard self-supervised methods capture generic features and remain vulnerable to domain shift. We propose \ours, a self-supervised framework that \emph{visually probes} target domains without labels. \ours introduces a Self-supervised Prompt Enhancement Module (SPEM), which derives defect-aware prompts from unlabeled target data to guide a frozen ViT backbone, and a Domain-Aware Prompt Alignment (DAPA) objective, which aligns prompt-conditioned source and target representations. Experiments on four challenging benchmarks show that \ours consistently outperforms strong supervised, self-supervised, and adaptation baselines, achieving robust zero-shot transfer, improved resilience to domain variations, and high data efficiency in few-shot adaptation. These results highlight self-supervised prompting as a practical direction for building scalable and adaptive visual inspection systems. Source code is publicly available: https://github.com/xixiaouab/PROBE/tree/main

2025-11-15

ArXiv (prépublication)

doi.org

arxiv.org

DENetwork unveils non-differentially expressed genes with functional relevance across conditions through information flow perturbation

Bowen Zhao

Ting-Yi Su

Jingtao Wang

Quazi S. Islam

Kailu Song

Steven K. Huang

Matthieu Allez

Gregory J. Fonseca

Carolyn J. Baglole

Jun Ding

Differential gene expression (DE) analysis of RNA-sequencing (RNA-seq) data is a standard approach for identifying phenotypic differences be… (voir plus)tween conditions. However, traditional DE methods such as DESeq2 focus on expression changes alone, often overlooking non-differentially expressed (non-DE) genes that may play key regulatory roles. This limits their ability to identify upstream drivers of transcriptomic variation. To address this gap, we introduce DENetwork, a network-based approach that prioritizes genes based on their influence on global information flow. Each gene is scored using an in silico knockout strategy that quantifies its impact across the inferred gene network, capturing both DE and non-DE genes with potential functional relevance. DENetwork deciphers intricate regulatory and signaling networks driving transcriptomic variations between conditions with distinct phenotypes. Across simulated and disease-relevant RNA-seq datasets, DENetwork identifies non-DE regulators enriched in known pathways and phenotypic associations, providing mechanistic insights missed by standard DE analysis, with implications for target discovery and intervention.

2025-11-14

Nucleic Acids Research (publié)

doi.org

Graph topological property recovery with heat and wave dynamics-based features on graphs

Dhananjay Bhaskar

Yanlei Zhang

Charles Xu

Xingzhi Sun

Oluwadamilola Fasina

Arman Afrasiyabi

Siddharth Viswanath

Guy Wolf

Maximilian Nickel

Michael Perlmutter

Smita Krishnaswamy

2025-11-12

TAG-DS/2025/Conference (spotlight)

doi.org

openreview.net

MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data

Dimitrios Sinodinos

Jack Yi Wei

Narges Armanfard

Tabular data is the most abundant data type in the world, powering systems in finance, healthcare, e-commerce, and beyond. As tabular datase… (voir plus)ts grow and span multiple related targets, there is an increasing need to exploit shared task information for improved multitask generalization. Multitask learning (MTL) has emerged as a powerful way to improve generalization and efficiency, yet most existing work focuses narrowly on large-scale recommendation systems, leaving its potential in broader tabular domains largely underexplored. Also, existing MTL approaches for tabular data predominantly rely on multi-layer perceptron-based backbones, which struggle to capture complex feature interactions and often fail to scale when data is abundant, a limitation that transformer architectures have overcome in other domains. Motivated by this, we introduce MultiTab-Net, the first multitask transformer architecture specifically designed for large tabular data. MultiTab-Net employs a novel multitask masked-attention mechanism that dynamically models feature-feature dependencies while mitigating task competition. Through extensive experiments, we show that MultiTab-Net consistently achieves higher multitask gain than existing MTL architectures and single-task transformers across diverse domains including large-scale recommendation data, census-like socioeconomic data, and physics datasets, spanning a wide range of task counts, task types, and feature modalities. In addition, we contribute MultiTab-Bench, a generalized multitask synthetic dataset generator that enables systematic evaluation of multitask dynamics by tuning task count, task correlations, and relative task complexity. Our code is publicly available at https://github.com/Armanfard-Lab/MultiTab.

2025-11-12

ArXiv (prépublication)

doi.org

arxiv.org

The Geometry and Topology of Modular Addition Representations

Gabriela Moisescu-Pareja

Colin Daniels

Jonathan Love

The Clock and Pizza interpretations, associated with neural architectures differing in either uniform or learnable attention, were introduce… (voir plus)d to argue that different architectural designs can yield distinct circuits for modular addition. Applying geometric and topological analyses to learned representations, we show that this is not the case: Clock and Pizza circuits are topologically and geometrically equivalent and are thus equivalent representations.

2025-11-12

TAG-DS/2025/Conference (poster)

openreview.net

A HOT Dataset: 150,000 Buildings for HVAC Operations Transfer Research

Anaïs Berkes

Yoshua Bengio

David Rolnick

Donna Vakalis

2025-11-10

Proceedings of the 12th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (publié)

doi.org

A HOT Dataset: 150,000 Buildings for HVAC Operations Transfer Research

Anaïs Berkes

Yoshua Bengio

David Rolnick

Donna Vakalis

About 12% of global energy consumption is attributable to heating, ventilation, and air conditioning (HVAC) systems in buildings [11]. Machi… (voir plus)ne learning-based intelligent HVAC control offers significant energy efficiency potential, but progress is constrained by limited data for training and evaluating performance across different kinds of buildings. Existing datasets primarily target energy prediction rather than control applications, forcing studies to rely on limited building sets or single-variable perturbations that fail to capture real-world complexity. We present HOT (HVAC Operations Transfer), the first large-scale open-source dataset purpose-built for research into transfer learning in building control. HOT contains 159,744 unique building-weather combinations with systematic variations across envelope properties, occupancy patterns, and climate conditions spanning all 19 ASHRAE climate zones across 76 global locations. We formalise a comprehensive similarity-based framework with quantitative metrics for assessing transfer feasibility between source and target buildings across multiple context dimensions. Our key contributions: (1) a large-scale, open dataset and tooling enabling systematic, multi-variable transfer studies across 19 climate zones; (2) a quantitative similarity framework spanning geometry, thermal, climate, and function; and (3) zero-shot climate transfer experiments showing why realistic context variation matters for HVAC control.

2025-11-10

ACM Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (publié)

doi.org

Iterative Monte Carlo Tree Search for Neural Architecture Search

Mehraveh Javan

Matthew Toews

Marco Pedersoli

2025-11-10

Proceedings of the Fourth International Conference on Automated Machine Learning (publié)

proceedings.mlr.press

LT-Soups: Bridging Head and Tail Classes via Subsampled Model Soups

Masih Aminbeidokhti

Subhankar Roy

Eric Granger

Elisa Ricci

Marco Pedersoli

Real-world datasets typically exhibit long-tailed (LT) distributions, where a few head classes dominate and many tail classes are severely u… (voir plus)nderrepresented. While recent work shows that parameter-efficient fine-tuning (PEFT) methods like LoRA and AdaptFormer preserve tail-class performance on foundation models such as CLIP, we find that they do so at the cost of head-class accuracy. We identify the head-tail ratio, the proportion of head to tail classes, as a crucial but overlooked factor influencing this trade-off. Through controlled experiments on CIFAR100 with varying imbalance ratio (

2025-11-10

ArXiv (prépublication)

doi.org

arxiv.org

Comparability of Canadian SARS-CoV-2 seroprevalence estimates with statistical adjustment for socio-demographic representation

Yuan Yu

Jiacheng Chen

Matthew J. Knight

Sheila F. O’Brien

David L. Buckeridge

Carmen L. Charlton

W. Alton Russell

OBJECTIVE SARS-CoV-2 serological surveillance used blood donors, research cohorts, and residual patient samples. Differences in socio-demogr… (voir plus)aphic characteristics across these sources may bias seroprevalence estimates, necessitating statistical adjustment. METHODS We re-analyzed data from six serosurveillance sources, comparing the estimated percent of the population positive for SARS-CoV-2 anti-nucleocapsid antibodies for six regions during periods when the sources' sample collection overlapped. We assessed the concordance between sources with and without using multilevel regression and poststratification (MRP) to adjust for differences in representation by age, sex, and race. RESULTS Across regions and timepoints, unadjusted seroprevalence differed between sources by up to 20%. MRP did not consistently improve comparability of seroprevalence across sources. In 2022, seroprevalence was consistently highest among blood donors, and MRP increased regional seroprevalence across all sources (except in Manitoba during January-April 2022 in ABC Study). In a secondary regression analysis, immunoassay kit and sample type (dried blood spot or venous blood draw) strongly influenced the odds that a sample was classified as seropositive. CONCLUSION Adjusting for representativeness using common socio-demographic variables did not systematically improve concordance in seropositivity estimates between serosurveillance sources. While discrepancies between sources might be influenced by studies' representativeness of characteristics we did not assess, methods for measuring seropositivity appear to explain much of the differences between sources. Serosurveillance findings are influenced by many aspects of study design beyond representativeness, such as sample type (venous blood draw or dried blood spots), choice of immunoassay, and laboratory procedures such as dilution or immunoassay calibration.

2025-11-09

Canadian Journal of Public Health (publié)

doi.org

Efficient, Non‐Destructive Transfer of Wafer‐Scale Monolayer MoS
<sub>2</sub>
by Interface Engineering

Zheng Wei

Yongqing Cai

Jieying Liu

Liyan Zhang

Jiaojiao Zhao

Jian Tang

Li Li

Qinqin Wang

Huimin Zhang

Zhihua Zhang

Dongxia Shi

Luojun Du

2025-11-09

Advanced Functional Materials (publié)

doi.org

La plateforme Mila Ventures

Boussole des politiques en IA

Publications du Fellowship en politiques de l'IA

Publications

La plateforme Mila Ventures

Boussole des politiques en IA

Publications du Fellowship en politiques de l'IA

Mots-clés populaires:

Publications