Language Models over Canonical Byte-Pair Encodings
Tim Vieira
Tianyu Liu
Clemente Pasti
Yahya Emara
Brian DuSell
Benjamin LeBrun
Mario Giulianelli
Juan Luis Gastaldi
Ryan Cotterell
Learning Penalty for Optimal Partitioning via Automatic Feature Extraction
Tung L. Nguyen
Changepoint detection identifies significant shifts in data sequences, making it important in areas like finance, genetics, and healthcare. … (see more)The Optimal Partitioning algorithms efficiently detect these changes, using a penalty parameter to limit the changepoints number. Determining the appropriate value for this penalty can be challenging. Traditionally, this process involved manually extracting statistical features, such as sequence length or variance to make the prediction. This study proposes a novel approach that uses recurrent neural networks to learn this penalty directly from raw sequences by automatically extracting features. Experiments conducted on 20 benchmark genomic datasets show that this novel method surpasses traditional methods in partitioning accuracy in most cases.
Leveraging Per-Instance Privacy for Machine Unlearning
Nazanin Mohammadi Sepahvand
Anvith Thudi
Berivan Isik
Ashmita Bhattacharyya
Nicolas Papernot
Eleni Triantafillou
Daniel M. Roy
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
Pooneh Mousavi
Shubham Gupta
Foundation models based on large language models (LLMs) have shown great success in handling various tasks and modalities. However, adapting… (see more) these models for general-purpose audio-language tasks is challenging due to differences in acoustic environments and task variations. In this work, we introduce LiSTEN Learning Soft Token Embeddings for Neural Audio LLMs), a framework for adapting LLMs to speech and audio tasks. LiSTEN uses a dynamic prompt selection strategy with learnable key-value pairs, allowing the model to balance general and task-specific knowledge while avoiding overfitting in a multitask setting. Our approach reduces dependence on large-scale ASR or captioning datasets, achieves competitive performance with fewer trainable parameters, and simplifies training by using a single-stage process. Additionally, LiSTEN enhances interpretability by analyzing the diversity and overlap of selected prompts across different tasks.
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Paul McVay
Sergio Arnaud
Ada Martin
Arjun Majumdar
Krishna Murthy
Phillip Thomas
Ruslan Partsey
Daniel Dugas
Abha Gejji
Alexander Sax
Vincent-Pierre Berges
Mikael Henaff
Ayush Jain
Ang Cao
Ishita Prasad
Mrinal Kalakrishnan
Nicolas Ballas
Mahmoud Assran
Oleksandr Maksymets … (see 2 more)
Aravind Rajeswaran
Franziska Meier
Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning
Jiashun Liu
Zihao Wu
Johan Samir Obando Ceron
Ling Pan
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Huang Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Ghada Sokar
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang
Johan Samir Obando Ceron
Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this pap… (see more)er, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability for out-of-batch data induced by mini-batch training. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Lucas Caccia
Franccois Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila
High computation costs and latency of large language models such as GPT-4 have limited their deployment in clinical settings. Small language… (see more) models (SLMs) offer a cost-effective alternative, but their limited capacity requires biomedical domain adaptation, which remains challenging. An additional bottleneck is the unavailability and high sensitivity of clinical data. To address these challenges, we propose a novel framework for adapting SLMs into high-performing clinical models. We introduce the MediPhi collection of 3.8B-parameter SLMs developed with our novel framework: pre-instruction tuning of experts on relevant medical and clinical corpora (PMC, Medical Guideline, MedWiki, etc.), model merging, and clinical-tasks alignment. To cover most clinical tasks, we extended the CLUE benchmark to CLUE+, doubling its size. Our expert models deliver relative improvements on this benchmark over the base model without any task-specific fine-tuning: 64.3% on medical entities, 49.5% on radiology reports, and 44% on ICD-10 coding (outperforming GPT-4-0125 by 14%). We unify the expert models into MediPhi via model merging, preserving gains across benchmarks. Furthermore, we built the MediFlow collection, a synthetic dataset of 2.5 million high-quality instructions on 14 medical NLP tasks, 98 fine-grained document types, and JSON format support. Alignment of MediPhi using supervised fine-tuning and direct preference optimization achieves further gains of 18.9% on average.
Monitoring morphometric drift in lifelong learning segmentation of the spinal cord
Enamundram Naga Karthik
Sandrine B'edard
Jan Valošek
Christoph Aigner
Elise Bannier
Josef Bednavr'ik
Virginie Callot
Anna Combes
Armin Curt
Gergely David
Falk Eippert
Lynn Farner
M. G. Fehlings
Patrick Freund
Tobias Granberg
Cristina Granziera
Rhscir Network Imaging Group
Ulrike Horn
Tom'avs Hor'ak
Suzanne Humphreys … (see 36 more)
Markus Hupp
Anne Kerbrat
Nawal Kinany
Shannon Kolind
Petr Kudlivcka
Anna Lebret
Lisa Eunyoung Lee
Caterina Mainero
Allan R. Martin
Megan McGrath
Govind Nair
Kristin P. O’Grady
Jiwon Oh
Russell Ouellette
Nikolai Pfender
Dario Pfyffer
P. Pradat
Alexandre Prat
Emanuele Pravatà
D. S. Reich
Ilaria Ricchi
Naama Rotem-Kohavi
Simon Schading-Sassenhausen
Maryam Seif
Andrew C. Smith
Seth Aaron Smith
Grace Sweeney
Roger Tam
Anthony Traboulsee
Constantina A. Treaba
Charidimos Tsagkas
Zachary Vavasour
Dimitri Van De Ville
Kenneth A. Weber
Monte Carlo Tree Diffusion for System 2 Planning
Jaesik Yoon
Hyeonseo Cho
Doojin Baek
Sungjin Ahn
Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance nat… (see more)urally improves with additional test-time computation (TTC), standard diffusion-based planners offer only limited avenues for TTC scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as TTC increases.