Publications

Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

Johan Obando-Ceron

Deep reinforcement learning systems often suffer from unstable training dynamics due to non-stationarity, where learning objectives and data… (see more) distributions evolve over time. We show that under non-stationary targets, isotropic Gaussian embeddings are provably advantageous. In particular, they induce stable tracking of time-varying targets for linear readouts, achieve maximal entropy under a fixed variance budget, and encourage a balanced use of all representational dimensions--all of which enable agents to be more adaptive and stable. Building on this insight, we propose the use of Sketched Isotropic Gaussian Regularization for shaping representations toward an isotropic Gaussian distribution during training. We demonstrate empirically, over a variety of domains, that this simple and computationally inexpensive method improves performance under non-stationarity while reducing representation collapse, neuron dormancy, and training instability.

2026-02-21

arXiv (preprint)

doi.org

openreview.net

VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing

Juan Rodriguez

Haotian Zhang

Abhay Puri

Tianyang Zhang

Rishav Pramanik

Meng Lin

Xiaoqing Xie

Marco Terral

Aly Shariff

Sai Rajeswar

Christopher Pal

We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, com… (see more)plex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four tasks with expert human-authored annotations: the novel Sketch2SVG task (VG-Sketch); a new SVG editing dataset (VG-Edit) featuring complex, multi-step edits with higher-order primitives; Text2SVG generation (VG-Text); and SVG captioning (VG-Cap). Unlike prior benchmarks that rely on synthetic edits, VectorGym provides gold-standard human annotations that require semantic understanding and design intent. We also propose a multi-task reinforcement learning approach that jointly optimizes across all four tasks using rendering-based rewards. Our method, built on GRPO with curriculum learning, trains a Qwen3-VL 8B model that achieves state-of-the-art performance among open-source models, surpassing much larger models including Qwen3-VL 235B and matching GPT-4o. We also introduce a VLM-as-a-Judge metric for SVG generation, validated through human correlation studies. Our evaluation of frontier VLMs reveals significant performance gaps, positioning VectorGym as a rigorous framework for advancing visual code generation. VectorGym is publicly available on huggingface.co/datasets/ServiceNow/VectorGym.

2026-02-21

arXiv (preprint)

doi.org

arxiv.org

EEG-based quantification of chronic pain in cats: A proof-of-concept study using the Piq algorithm

Aliénor Delsart

Colince Segning

Aude Castel

Colombe Otis

Guillaume Dumas

Maxim Moreau

Bertrand Lussier

Rubens Da Silva

Karen Barros Parron Fernandes

Johanne Martel-Pelletier

Jean-Pierre Pelletier

Eric Troncy

Suzy Ngomo

While chronic pain assessment in household pets remains challenging, the use of non-invasive electroencephalography (EEG) in cats has shown … (see more)promise to identify pain more objectively in this species. A novel EEG-based algorithm - Pain identification and quantification (Piq) - was originally developed in humans to quantify pain intensity. In this proof-of-concept study, the objective was to evaluate whether the Piq algorithm could be explored for feasibility to identify and quantify chronic osteoarthritic (OA) pain in cats. Adult neutered cats (n = 5 including n = 2 with osteoarthritis, OA) were assessed for their functional impairment (Montreal instrument for cat arthritis testing for use by veterinarians, MI-CAT(V)) and neuro-sensitization at both peripheral (Paw Withdrawal Threshold, PWT) and spinal (response to mechanical temporal summation, RMTS) levels. Resting-state EEG recordings were acquired from Cz, C3/C4 under conscious and sedated conditions. The first five minutes of EEG data were analyzed using the Piq algorithm, with Piq scores ≥ 10 % used as an exploratory threshold transferred from human studies. Pain-free cats showed gamma frequency band Piq scores  10 % while OA cats exceeded 10 % in both conscious and sedated conditions at Cz. Piq scores were negatively correlated with PWT, sug

2026-02-20

Veterinary Journal (published)

doi.org

Give Users the Wheel: Towards Promptable Recommendation Paradigm

Fuyuan Lyu

Chenglin Luo

Qiyuan Zhang

Yupeng Hou

Haolun Wu

Xing Tang

Xue Liu

Jin L.C. Guo

xiuqiang He

Conventional sequential recommendation models have achieved remarkable success in mining implicit behavioral patterns. However, these archit… (see more)ectures remain structurally blind to explicit user intent: they struggle to adapt when a user's immediate goal (e.g., expressed via a natural language prompt) deviates from their historical habits. While Large Language Models (LLMs) offer the semantic reasoning to interpret such intent, existing integration paradigms force a dilemma: LLM-as-a-recommender paradigm sacrifices the efficiency and collaborative precision of ID-based retrieval, while Reranking methods are inherently bottlenecked by the recall capabilities of the underlying model. In this paper, we propose Decoupled Promptable Sequential Recommendation (DPR), a model-agnostic framework that empowers conventional sequential backbones to natively support Promptable Recommendation, the ability to dynamically steer the retrieval process using natural language without abandoning collaborative signals. DPR modulates the latent user representation directly within the retrieval space. To achieve this, we introduce a Fusion module to align the collaborative and semantic signals, a Mixture-of-Experts (MoE) architecture that disentangles the conflicting gradients from positive and negative steering, and a three-stage training strategy that progressively aligns the semantic space of prompts with the collaborative space. Extensive experiments on real-world datasets demonstrate that DPR significantly outperforms state-of-the-art baselines in prompt-guided tasks while maintaining competitive performance in standard sequential recommendation scenarios.

2026-02-20

arXiv (preprint)

doi.org

arxiv.org

Sociodynamics of Reinforcement Learning

Yann Bouteiller

Karthik Soma

Giovanni Beltrame

Reinforcement Learning (RL) has emerged as a core algorithmic paradigm explicitly driving innovation in a growing number of industrial appli… (see more)cations, including large language models and quantitative finance. Furthermore, computational neuroscience has long found evidence of natural forms of RL in biological brains. Therefore, it is crucial for the study of social dynamics to develop a scientific understanding of how RL shapes population behaviors. We leverage the framework of Evolutionary Game Theory (EGT) to provide building blocks and insights toward this objective. We propose a methodology that enables simulating large populations of RL agents in simple game theoretic interaction models. More specifically, we derive fast and parallelizable implementations of two fundamental revision protocols from multi-agent RL - Policy Gradient (PG) and Opponent-Learning Awareness (LOLA) - tailored for population simulations of random pairwise interactions in stateless normal-form games. Our methodology enables us to simulate large populations of 200,000 independent co-learning agents, yielding compelling insights into how non-stationarity-aware learners affect social dynamics. In particular, we find that LOLA learners promote cooperation in the Stag Hunt model, delay cooperative outcomes in the Hawk-Dove model, and reduce strategy diversity in the Rock-Paper-Scissors model.

2026-02-20

Transactions on Machine Learning Research (accepted)

openreview.net

Anisotropic local law for non-separable sample covariance matrices

Fan Zhou

Renyuan Ma

Elliot Paquette

Zhichao Wang

Zhou Fan

We establish local laws for sample covariance matrices …

2026-02-19

arXiv (preprint)

doi.org

arxiv.org

Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems

Tomas Gonzalez

Cristobal Guzman

Courtney Paquette

2026-02-19

SIAM Journal on Optimization (published)

doi.org

arxiv.org

On the Adversarial Robustness of Discrete Image Tokenizers

Rishika Bhagwatkar

Irina Rish

Nicolas Flammarion

Francesco Croce

Discrete image tokenizers encode visual inputs as sequences of tokens from a finite vocabulary and are gaining popularity in multimodal syst… (see more)ems, including encoder-only, encoder-decoder, and decoder-only models. However, unlike CLIP encoders, their vulnerability to adversarial attacks has not been explored. Ours being the first work studying this topic, we first formulate attacks that aim to perturb the features extracted by discrete tokenizers, and thus change the extracted tokens. These attacks are computationally efficient, application-agnostic, and effective across classification, multimodal retrieval, and captioning tasks. Second, to defend against this vulnerability, inspired by recent work on robust CLIP encoders, we fine-tune popular tokenizers with unsupervised adversarial training, keeping all other components frozen. While unsupervised and task-agnostic, our approach significantly improves robustness to both unsupervised and end-to-end supervised attacks and generalizes well to unseen tasks and data. Unlike supervised adversarial training, our approach can leverage unlabeled images, making it more versatile. Overall, our work highlights the critical role of tokenizer robustness in downstream tasks and presents an important step in the development of safe multimodal foundation models.

2026-02-19

arXiv (preprint)

doi.org

arxiv.org

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

Angelo Porrello

Pietro Buzzega

Felix Dangel

Thomas Sommariva

Riccardo Salami

Lorenzo Bonicelli

Simone Calderara

Task Arithmetic yields a modular, scalable way to adapt foundation models. Combining multiple task vectors, however, can lead to cross-task … (see more)interference, causing representation drift and degraded performance. Representation drift regularization provides a natural remedy to disentangle task vectors; however, existing approaches typically require external task data, conflicting with modularity and data availability constraints (e.g., privacy requirements). We propose a dataless approach by framing regularization against representation drift as a curvature matrix approximation problem. This allows us to leverage well-established techniques; in particular, we adopt Kronecker-Factored Approximate Curvature and obtain a practical regularizer that achieves state-of-the-art results in task addition and negation. Our method has constant complexity in the number of tasks and promotes robustness to task vector rescaling, eliminating the need for held-out tuning.

2026-02-18

Open MIND (preprint)

doi.org

arxiv.org

GASS: Geometry-Aware Spherical Sampling for Disentangled Diversity Enhancement in Text-to-Image Generation

Ye Zhu

Kaleb S. Newman

Johannes F. Lutzeyer

Adriana Romero

Michal Drozdzal

Olga Russakovsky

2026-02-18

ArXiv (preprint)

arxiv.org

GeneZip: Region-Aware Compression for Long Context DNA Modeling

Hongyu Guo

Genomic sequences span billions of base pairs (bp), posing a fundamental challenge for genome-scale foundation models. Existing approaches l… (see more)argely sidestep this barrier by either scaling relatively small models to long contexts or relying on heavy multi-GPU parallelism. Here we introduce GeneZip, a DNA compression model that leverages a key biological prior: genomic information is highly imbalanced. Coding regions comprise only a small fraction (about 2 percent) yet are information-dense, whereas most non-coding sequence is comparatively information-sparse. GeneZip couples HNet-style dynamic routing with a region-aware compression-ratio objective, enabling adaptive allocation of representation budget across genomic regions. As a result, GeneZip learns region-aware compression and achieves 137.6x compression with only 0.31 perplexity increase. On downstream long-context benchmarks, GeneZip achieves comparable or better performance on contact map prediction, expression quantitative trait loci prediction, and enhancer-target gene prediction. By reducing effective sequence length, GeneZip unlocks simultaneous scaling of context and capacity: compared to the prior state-of-the-art model JanusDNA, it enables training models 82.6x larger at 1M-bp context, supporting a 636M-parameter GeneZip model at 1M-bp context. All experiments in this paper can be trained on a single A100 80GB GPU.

2026-02-18

arXiv (preprint)

doi.org

openreview.net

Offline-Online Retail Collaboration via Pickup Partnership

Zahra Jalali

Maxime C. Cohen

Necati Ertekin

Mehmet Gümüş

We study a growing retail strategy called pickup partnership, where online retailers partner with physical stores to offer in-store pickup s… (see more)ervices. In practice, two main policies are used in these partnerships: (i) a fixed fee policy, where the retailer pays the offline partner a set fee per pickup order, and (ii) a coupon policy, where customers receive a coupon for use at the offline partner’s store with each pickup order. Our goal is to evaluate these policies and determine which is most beneficial for online retailers. We develop a stylized model that captures the essential dynamics of pickup partnerships. We find that although the coupon policy allows the online retailer to gain greater market coverage compared with the fixed fee policy, it does not always lead to higher profits for the online retailer. The coupon policy is preferred when in-store fulfillment and pickup handling costs are low and direct-delivery costs are high, whereas the fixed fee policy is favored when these costs are moderate. We also find that both policies entail inefficiencies when the incentives of the two parties are not aligned. To alleviate such inefficiencies, we propose a new policy designed to better align incentives and improve partnership efficiency. This paper offers the first theoretical analysis of the in-store pickup partnership model and provides practical guidance for online retailers seeking to implement it. Our proposed policy aims to enhance the effectiveness and profitability of these partnerships beyond current industry practices. Supplemental Material: The online appendix is available at https://doi.org/10.1287/serv.2025.0118 .

2026-02-18

Service Science (published)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications