Publications

VOCALoco: Viability-Optimized Cost-aware Adaptive Locomotion

Simon Li

Anas El Houssaini

Recent advancements in legged robot locomotion have facilitated traversal over increasingly complex terrains. Despite this progress, many ex… (see more)isting approaches rely on end-to-end deep reinforcement learning (DRL), which poses limitations in terms of safety and interpretability, especially when generalizing to novel terrains. To overcome these challenges, we introduce VOCALoco, a modular skill-selection framework that dynamically adapts locomotion strategies based on perceptual input. Given a set of pre-trained locomotion policies, VOCALoco evaluates their viability and energy-consumption by predicting both the safety of execution and the anticipated cost of transport over a fixed planning horizon. This joint assessment enables the selection of policies that are both safe and energy-efficient, given the observed local terrain. We evaluate our approach on staircase locomotion tasks, demonstrating its performance in both simulated and real-world scenarios using a quadrupedal robot. Empirical results show that VOCALoco achieves improved robustness and safety during stair ascent and descent compared to a conventional end-to-end DRL policy

2026-01-31

IEEE Robotics and Automation Letters (published)

doi.org

arxiv.org

Parallel Stochastic Gradient-Based Planning for World Models

Michael Psenka

Michael G. Rabbat

Aditi Krishnapriyan

Yann Lecun

Amir Bar

World models simulate environment dynamics from raw sensory inputs like video. However, using them for planning can be challenging due to th… (see more)e vast and unstructured search space. We propose a robust and highly parallelizable planner that leverages the differentiability of the learned world model for efficient optimization, solving long-horizon control tasks from visual input. Our method treats states as optimization variables ("virtual states") with soft dynamics constraints, enabling parallel computation and easier optimization. To facilitate exploration and avoid local optima, we introduce stochasticity into the states. To mitigate sensitive gradients through high-dimensional vision-based world models, we modify the gradient structure to descend towards valid plans while only requiring action-input gradients. Our planner, which we call GRASP (Gradient RelAxed Stochastic Planner), can be viewed as a stochastic version of a non-condensed or collocation-based optimal controller. We provide theoretical justification and experiments on video-based world models, where our resulting planner outperforms existing planning algorithms like the cross-entropy method (CEM) and vanilla gradient-based optimization (GD) on long-horizon experiments, both in success rate and time to convergence.

2026-01-30

ArXiv (preprint)

arxiv.org

SkeleShare: Algorithmic Skeletons and Equality Saturation for Hardware Resource Sharing

Jonathan Van der Cruysse

Tzung-Han Juang

Shakiba Bolbolian Khah

Christophe Dubach

Compiling functional programs into efficient Field Programmable Gate Array (FPGA) designs is difficult. Hardware resources must be explicitl… (see more)y allocated and shared to maximize resource efficiency. This requires careful orchestration of several transformations to expose and exploit sharing opportunities.This paper introduces SkeleShare, a novel approach that automates the problem of resource allocation and sharing. It leverages equality saturation and algorithmic skeletons to expose sharing opportunities across abstraction levels. A solver-based extractor then selects a design that consolidates computations, meeting resource constraints while maintaining performance.This approach is evaluated on neural networks and image processing targeting a real FPGA. The paper shows how SkeleShare is used to express the various algorithmic patterns and transformation rules inherent in neural network operators. The experimental evaluation demonstrates that SkeleShare’s fully automated resource allocation and sharing matches and exceeds the performance of prior work, which involves expert manual extraction of sharing opportunities.

2026-01-30

IEEE/ACM Symposium on Code Generation and Optimization (published)

doi.org

Synthesizing Specialized Sparse Tensor Accelerators for FPGAs via High-Level Functional Abstractions

Hamza Javed

Christophe Dubach

Sparsity is inherent in many applications such as machine learning and graph analytics. However, achieving high efficiency in sparse computa… (see more)tions requires specialized hardware accelerators like FPGAs, as traditional accelerators typically cater to dense data. While high level synthesis enables the automatic generation of FPGA-based accelerators, generic solutions produced via C-based synthesis flows often demand extensive development time, leading designers to prioritize broad applicability over fine-grained structural specialization. Consequently, these accelerators fail to fully exploit FPGA’s reconfigurablility, leaving substantial performance and efficiency gains untapped.This paper pushes the boundary by automatically generating specialized accelerators that match a given fixed sparse structure (e.g., in static graph analytics and pruned neural networks). It achieves this by leveraging functional abstractions within high level synthesis, an approach that has already proven effective in automating the generation of specialized dense tensor accelerator. Tensor shapes are encoded directly in the type system and specialized primitives for irregular data are introduced. Together, these innovations enable a concise specification of sparse accelerators and drive advanced optimizations—including dynamic partitioning and vector sharding—to produce hardware precisely tailored to the sparsity pattern of the underlying tensors.Compared to state-of-the-art generic accelerators (HiSparse, HiSpMV and GraphLily), the approach achieves up to a 2.8× improvement in bandwidth efficiency for sparse matrix computations and a 1.8× speedup on graph algorithms. Against the hls4ml neural network acceleration framework, it achieves up to a 1.8× improvement in throughput with a 4× reduction in resource usage, enabling scaling to larger networks. These results establish this approach as a flexible, powerful, and rapid solution for designing high-performance specialized sparse accelerators.

2026-01-30

IEEE/ACM Symposium on Code Generation and Optimization (published)

doi.org

Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization

Luca Della Libera

Yusuf Cem Sübakan

Mirco Ravanaelli

Neural audio codecs are at the core of modern conversational speech technologies, converting continuous speech into sequences of discrete to… (see more)kens that can be processed by LLMs. However, existing codecs typically operate at fixed frame rates, allocating tokens uniformly in time and producing unnecessarily long sequences. In this work, we introduce DyCAST, a Dynamic Character-Aligned Speech Tokenizer that enables variable-frame-rate tokenization through soft character-level alignment and explicit duration modeling. DyCAST learns to associate tokens with character-level linguistic units during training and supports alignment-free inference with direct control over token durations at decoding time. To improve speech resynthesis quality at low frame rates, we further introduce a retrieval-augmented decoding mechanism that enhances reconstruction fidelity without increasing bitrate. Experiments show that DyCAST achieves competitive speech resynthesis quality and downstream performance while using significantly fewer tokens than fixed-frame-rate codecs. Code and checkpoints will be released publicly at https://github.com/lucadellalib/dycast.

2026-01-29

ArXiv (preprint)

arxiv.org

Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization

Luca Della Libera

Cem Subakan

Mirco Ravanelli

Neural audio codecs are at the core of modern conversational speech technologies, converting continuous speech into sequences of discrete to… (see more)kens that can be processed by LLMs. However, existing codecs typically operate at fixed frame rates, allocating tokens uniformly in time and producing unnecessarily long sequences. In this work, we introduce DyCAST, a Dynamic Character-Aligned Speech Tokenizer that enables variable-frame-rate tokenization through soft character-level alignment and explicit duration modeling. DyCAST learns to associate tokens with character-level linguistic units during training and supports alignment-free inference with direct control over token durations at decoding time. To improve speech resynthesis quality at low frame rates, we further introduce a retrieval-augmented decoding mechanism that enhances reconstruction fidelity without increasing bitrate. Experiments show that DyCAST achieves competitive speech resynthesis quality and downstream performance while using significantly fewer tokens than fixed-frame-rate codecs. Code and checkpoints will be released publicly at https://github.com/lucadellalib/dycast.

2026-01-29

Open MIND (preprint)

doi.org

arxiv.org

Dispersion Loss Counteracts Embedding Condensation and Improves Generalization in Small Language Models

Chen Liu

Xingzhi Sun

Xi Xiao

Alexandre Van Tassel

Ke Xu

Kristof Reimann

Danqi Liao

Mark B. Gerstein

Tianyang Wang

Xiao Wang

Smita Krishnaswamy

Large language models (LLMs) achieve remarkable performance through ever-increasing parameter counts, but scaling incurs steep computational… (see more) costs. To better understand LLM scaling, we study representational differences between LLMs and their smaller counterparts, with the goal of replicating the representational qualities of larger models in the smaller models. We observe a geometric phenomenon which we term

2026-01-29

ArXiv (preprint)

arxiv.org

Dual-Phase Continual Learning: Supervised Adaptation Meets Unsupervised Retention

Vaibhav Singh

Rahaf Aljundi

Eugene Belilovsky

Foundational vision-language models (VLMs) excel across diverse tasks, but adapting them to new domains without forgetting prior knowledge r… (see more)emains a critical challenge. Continual Learning (CL) addresses this challenge by enabling models to learn sequentially from new data while mitigating the forgetting of prior information, typically under supervised settings involving label shift. Nonetheless, abrupt distribution shifts can still cause substantial forgetting, potentially nullifying the benefits of supervised updates, especially when storing or replaying past data is infeasible. In this work, we propose leveraging unlabeled test-time data in an unsupervised manner to reinforce prior task performance without requiring replay or stored examples. Unlike traditional Test-Time Adaptation (TTA), which primarily focuses on domain shift or corruption, our method improves performance on earlier tasks by exploiting representative test samples encountered during deployment. We introduce a simple teacher-student framework with gradient-based sparse parameter updates, and show that it effectively mitigates forgetting in class-incremental CL for VLMs, offering a memory-free alternative to episodic replay with strong empirical results.

2026-01-29

Transactions on Machine Learning Research (accepted)

doi.org

openreview.net

Localized, High-resolution Geographic Representations with Slepian Functions

Arjun Rao

Ruth Crasto

Tessa Ooms

David Rolnick

Konstantin Klemmer

Marc Rußwurm

Geographic data is fundamentally local. Disease outbreaks cluster in population centers, ecological patterns emerge along coastlines, and ec… (see more)onomic activity concentrates within country borders. Machine learning models that encode geographic location, however, distribute representational capacity uniformly across the globe, struggling at the fine-grained resolutions that localized applications require. We propose a geographic location encoder built from spherical Slepian functions that concentrate representational capacity inside a region-of-interest and scale to high resolutions without extensive computational demands. For settings requiring global context, we present a hybrid Slepian-Spherical Harmonic encoder that efficiently bridges the tradeoff between local-global performance, while retaining desirable properties such as pole-safety and spherical-surface-distance preservation. Across five tasks spanning classification, regression, and image-augmented prediction, Slepian encodings outperform baselines and retain performance advantages across a wide range of neural network architectures.

2026-01-29

arXiv (preprint)

doi.org

arxiv.org

Perplexity Cannot Always Tell Right from Wrong

Petar Veličković

Federico Barbero

Christos Perivolaropoulos

Simon Kayode Osindero

Razvan Pascanu

Perplexity -- a function measuring a model's overall level of"surprise"when encountering a particular output -- has gained significant tract… (see more)ion in recent years, both as a loss function and as a simple-to-compute metric of model quality. Prior studies have pointed out several limitations of perplexity, often from an empirical manner. Here we leverage recent results on Transformer continuity to show in a rigorous manner how perplexity may be an unsuitable metric for model selection. Specifically, we prove that, if there is any sequence that a compact decoder-only Transformer model predicts accurately and confidently -- a necessary pre-requisite for strong generalisation -- it must imply existence of another sequence with very low perplexity, but not predicted correctly by that same model. Further, by analytically studying iso-perplexity plots, we find that perplexity will not always select for the more accurate model -- rather, any increase in model confidence must be accompanied by a commensurate rise in accuracy for the new model to be selected.

2026-01-29

ArXiv (preprint)

arxiv.org

Secure Tool Manifest and Digital Signing Solution for Verifiable MCP and LLM Pipelines

Saeid Jamshidi

Kawser Wazed Nafi

Arghavan Moradi Dakhel

Foutse Khomh

Amin Nikanjam

Mohammad Hamdaqa

2026-01-29

ArXiv (preprint)

arxiv.org

Securing Time in Energy IoT: A Clock-Dynamics-Aware Spatio-Temporal Graph Attention Network for Clock Drift Attacks and Y2K38 Failures

Saeid Jamshidi

Omar Abdel Wahab

Rolando Herrero

Foutse Khomh

The integrity of time in distributed Internet of Things (IoT) devices is crucial for reliable operation in energy cyber-physical systems, su… (see more)ch as smart grids and microgrids. However, IoT systems are vulnerable to clock drift, time-synchronization manipulation, and timestamp discontinuities, such as the Year 2038 (Y2K38) Unix overflow, all of which disrupt temporal ordering. Conventional anomaly-detection models, which assume reliable timestamps, fail to capture temporal inconsistencies. This paper introduces STGAT (Spatio-Temporal Graph Attention Network), a framework that models both temporal distortion and inter-device consistency in energy IoT systems. STGAT combines drift-aware temporal embeddings and temporal self-attention to capture corrupted time evolution at individual devices, and uses graph attention to model spatial propagation of timing errors. A curvature-regularized latent representation geometrically separates normal clock evolution from anomalies caused by drift, synchronization offsets, and overflow events. Experimental results on energy IoT telemetry with controlled timing perturbations show that STGAT achieves 95.7% accuracy, outperforming recurrent, transformer, and graph-based baselines with significant improvements (d>1.8, p0.001). Additionally, STGAT reduces detection delay by 26%, achieving a 2.3-time-step delay while maintaining stable performance under over

2026-01-29

ArXiv (preprint)

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications