Publications

Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation

Arina Kharlamova

Bowei He

Xue Liu

Online services rely on CAPTCHAs as a first line of defense against automated abuse, yet recent advances in multi-modal large language model… (voir plus)s (MLLMs) have eroded the effectiveness of conventional designs that focus on text recognition or 2D image understanding. To address this challenge, we present **Spatial CAPTCHA**, a novel human-verification framework that leverages fundamental differences in spatial reasoning between humans and MLLMs. Unlike existing CAPTCHAs that rely on low-level perception tasks vulnerable to modern AI, Spatial CAPTCHA generates dynamic questions requiring geometric reasoning, perspective-taking, occlusion handling, and mental rotation—skills intuitive for humans but difficult for current AI systems. The system employs a procedural generation pipeline with constraint-based difficulty control, automated correctness verification, and human-in-the-loop validation to ensure scalability, robustness, and adaptability. Evaluation on a corresponding benchmark, **Spatial-CAPTCHA-Bench**, demonstrates that humans vastly outperform 10 state-of-the-art MLLMs, with the best model achieving only 31.0\% Pass@1 accuracy. Result comparison with Google reCAPTCHA further confirms the effectiveness of Spatial CAPTCHA as both a security mechanism and a diagnostic tool for spatial reasoning in AI.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

Spatial pattern regression for meteorological fields interpolation

Vihotogb eacute Houssou

Julie Carreau

2025-12-31

SSRN Electronic Journal (accepté)

doi.org

Species Loss Scenarios Identify Canada's Northern Ecosystems as Disproportionately Vulnerable

Isaac Eckert

Dominique Caron

Laura J. Pollock

2025-12-31

Journal of Biogeography (publié)

doi.org

SSFL: Discovering Sparse Unified Subnetworks at Initialization for Efficient Federated Learning

Riyasat Ohib

Bishal Thapaliya

Gintare Karolina Dziugaite

Jingyu Liu 0001

Vince D. Calhoun

Sergey Plis

In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient commu… (voir plus)nication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are trained and communicated each round between the clients and the server. On standard benchmarks including CIFAR-10, CIFAR-100, and Tiny-ImageNet, SSFL consistently improves the accuracy sparsity trade off, achieving more than 20\% relative error reduction on CIFAR-10 compared to the strongest sparse baseline, while reducing communication costs by

2025-12-31

Trans. Mach. Learn. Res. (publié)

arxiv.org

Supervised Multimodal Model for Plasma Spray Diagnostics and Spray Health Monitoring

Cormac Cureton

Abhijeet Praveen

Sareh Soleimani

Aman Sidhu

Cristian Cojocaru

Kintak Raymond Yu

Narges Armanfard

2025-12-31

SSRN Electronic Journal (accepté)

doi.org

Suspected Biliary Atresia in Brazil: Impact of Regional Healthcare Variations on Diagnostic Timeliness

Luiza Telles

Paulo Henrique Moreira Melo

Ana Maria Bicudo Diniz

Gabriele Lech

Ayla Gerk

Lauren Kratky

Dan Poenaru

David P. Mooney

Joaquim Bustorff-Silva

2025-12-31

Journal of Pediatric Surgery Open (publié)

doi.org

Tequila: Deadzone-free Ternary Quantization for Large Language Models

Hong Huang

Decheng Wu

Rui Cen

Guanghua Yu

Zonghang Li

Kai Liu

Jianchen Zhu

Peng Chen

Xue Liu

Dapeng Wu

Quantization techniques are essential for the deployment of Large Language Models (LLMs) on edge devices. However, prevailing methods often … (voir plus)rely on mixed-precision multiplication that lacks efficient hardware support, making it not feasible. Ternary weight quantization addresses this by constraining weights to {-1, 0, 1}, replacing expensive multiplications with hardware-efficient additions. However, such aggressive compression leads to significant accuracy degradation, even after costly quantization-aware training with massive data. We identify the core issue as _**deadzone trapping**: a large number of weights are trapped at the deadzone boundary._ This occurs because these weights receive only noisy, less informative gradients, preventing stable escape from the deadzone and severely impeding model capacity and optimization. To address this issue, we propose **Tequila**, a trapping-free quantization optimization method that reactivates deadzone-trapped weights by repurposing them as dynamic biases. This allows the repurposed weights to provide a continuous signal in the forward pass and, critically, receive direct, meaningful gradient signals during backpropagation, thereby enhancing model capacity and optimization with nearly _zero_ inference overhead. Extensive evaluations demonstrate that Tequila outperforms state-of-the-art (SOTA) ternary quantization methods across five benchmarks. Specifically, on the ARC benchmark, it achieves

2025-12-31

International Conference on Learning Representations (Accept (Poster))

openreview.net

The 2nd Workshop on Foundation Models for Science: Real-World Impact and Science-First Design

Wuyang Chen

Yongji Wang

N. Benjamin Erichson

Laurence Perreault-Levasseur

Bo Li

Damian Borth

Swarat Chaudhuri

Scientific foundation models should be built for science, not for generic AI tastes or leaderboard prestige. This workshop centers problem-d… (voir plus)riven design: models that measurably advance real scientific inquiries, e.g., forecasting extreme climate events, accelerating materials discovery, understanding biological mechanisms, co-developed with domain experts and validated against field data, experiments, and downstream impact. We argue that foundation models for science must be built differently from language and vision. Scientific data are physical, causal, spatiotemporal, and often scarce or biased; objectives must reflect mechanistic fidelity, not just predictive accuracy. This calls for scientific priors and constraints, robust uncertainty quantification (UQ), and architectures that natively handle multi-modality (e.g., grids, meshes, spectra, time series, point clouds, text, images, code). It also demands tight integration with classical scientific tools (simulators, PDE solvers, optimization and inference engines, and HPC workflows) to yield hybrid systems that are faster, more accurate, and more trustworthy. We will highlight opportunities and hard problems unique to science: enforcing conservation laws and symmetries; learning across vast spatial and temporal scales; representing extreme events and tipping points; calibrating and validating UQ; and developing evaluation protocols that reward mechanistic insight and actionable reliability. The goal is a roadmap for building, training, and deploying scientific foundation models that accelerate discovery while respecting the structure of the natural world.

2025-12-31

Workshop Proposals @ International Conference on Learning Representations (publié)

openreview.net

The Design Space of Tri-Modal Masked Diffusion Models

Louis Bethune

Victor Turrisi

Bruno Kacper Mlodozeniec

Pau Rodriguez Lopez

Lokesh Boominathan

Nikhil Bhendawade

Amitis Shidani

Joris Pelemans

Theo X. Olausson

Devon Hjelm

Paul Dixon

Joao Monteiro

Pierre Ablin

Vishnu Banna

Arno Blaas

Nick Henderson

Kari Noriy

Dan Busbridge

Josh Susskind

Marco Cuturi … (voir 4 de plus)

Irina Belousova

Luca Zappella

Russ Webb

Jason Ramapuram

Discrete diffusion models have emerged as strong alternatives to autoregressive language models, with recent work initializing and fine-tuni… (voir plus)ng a base unimodal model for bimodal generation. Diverging from previous approaches, we introduce the first tri-modal masked diffusion model pretrained from scratch on text, image-text, and audio-text data. We systematically analyze multimodal scaling laws, modality mixing ratios, noise schedules, and batch-size effects, and we provide optimized inference sampling defaults. Our batch-size analysis yields a novel stochastic differential equation (SDE)-based reparameterization that eliminates the need for tuning the optimal batch size as reported in recent work. This reparameterization decouples the physical batch size, often chosen based on compute constraints (GPU saturation, FLOP efficiency, wall-clock time), from the logical batch size, chosen to balance gradient variance during stochastic optimization. Finally, we pretrain a preliminary 3B-parameter tri-modal model on 6.4T tokens, demonstrating the capabilities of a unified design and achieving strong results in text generation, text-to-image tasks, and text-to-speech tasks. Our work represents the largest-scale systematic open study of multimodal discrete diffusion models conducted to date, providing insights into scaling behaviors across multiple modalities.

2025-12-31

arXiv (prépublication)

doi.org

arxiv.org

The Expressive Limits of Diagonal SSMs for State-Tracking

State-Space Models (SSMs) have recently been shown to achieve strong empirical performance on a variety of long-range sequence modeling task… (voir plus)s while remaining efficient and highly-parallelizable. However, the theoretical understanding of their expressive power remains limited. In this work, we study the expressivity of input-Dependent Complex-valued Diagonal (DCD) State-Space Models (SSMs) on sequential state-tracking tasks for abstract groups. It is easy to show that a single DCD SSM layer with a universal decoder can track any Abelian group at finite precision by decomposing it into a product of cyclic groups. We show that this is tight by proving that such a model cannot track any non-Abelian group at finite precision. We further establish the expressivity of multi-layer DCD SSMs. We show that a

2025-12-31

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

The Geometry and Topology of Circuits: the Manifolds of Modular Addition

Gabriela Moisescu-Pareja

Colin Daniels

Jonathan Love

The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to ar… (voir plus)gue that different architectural designs can yield distinct circuits for modular addition. In this work, we show that this is not the case, and that both the uniform and trainable attention architectures implement the same algorithm via topologically and geometrically equivalent representations. Our methodology goes beyond the interpretation of individual neurons and weights. Instead, we identify all of the neurons corresponding to each learned representation and then study the collective group of neurons as one entity. This method reveals that each learned representation is a manifold that we can study utilizing tools from topology. Based on this insight, we can statistically analyze the learned representations across hundreds of circuits to demonstrate the similarity between learned modular addition circuits that arise naturally from common deep learning paradigms.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

openreview.net

The Impact of Pediatric Surgery Global Travel Fellowships: A Study by the Canadian Association of Paediatric Surgeons Global Partnership Committee.

Sacha Williams

Natasha Bejjani

Elena Guadagno

Robert Baird

Shahrzad Joharifard

Melanie Morris

Robin Petroze

Dan Poenaru

Sherif Emil

2025-12-31

Journal of Pediatric Surgery (publié)

doi.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mila sur Udemy

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mila sur Udemy

Mots-clés populaires:

Publications