Publications

The curriculum effect in visual learning: the role of readout dimensionality
Christopher C. Pack
Fresh in memory: Training-order recency is linearly encoded in language model activations
Dmitrii Krasheninnikov
Richard E. Turner
We show that language models'activations linearly encode when information was learned during training. Our setup involves creating a model w… (see more)ith a known training order by sequentially fine-tuning Llama-3.2-1B on six disjoint but otherwise similar datasets about named entities. We find that the average activations of test samples corresponding to the six training datasets encode the training order: when projected into a 2D subspace, these centroids are arranged exactly in the order of training and lie on a straight line. Further, we show that linear probes can accurately (~90%) distinguish"early"vs."late"entities, generalizing to entities unseen during the probes'own training. The model can also be fine-tuned to explicitly report an unseen entity's training stage (~80% accuracy). Interestingly, the training-order encoding does not seem attributable to simple differences in activation magnitudes, losses, or model confidence. Our paper demonstrates that models are capable of differentiating information by its acquisition time, and carries significant implications for how they might manage conflicting data and respond to knowledge modifications.
Fresh in memory: Training-order recency is linearly encoded in language model activations
Dmitrii Krasheninnikov
Richard E. Turner
We show that language models'activations linearly encode when information was learned during training. Our setup involves creating a model w… (see more)ith a known training order by sequentially fine-tuning Llama-3.2-1B on six disjoint but otherwise similar datasets about named entities. We find that the average activations of test samples corresponding to the six training datasets encode the training order: when projected into a 2D subspace, these centroids are arranged exactly in the order of training and lie on a straight line. Further, we show that linear probes can accurately (~90%) distinguish"early"vs."late"entities, generalizing to entities unseen during the probes'own training. The model can also be fine-tuned to explicitly report an unseen entity's training stage (~80% accuracy). Interestingly, the training-order encoding does not seem attributable to simple differences in activation magnitudes, losses, or model confidence. Our paper demonstrates that models are capable of differentiating information by its acquisition time, and carries significant implications for how they might manage conflicting data and respond to knowledge modifications.
Fresh in memory: Training-order recency is linearly encoded in language model activations
Dmitrii Krasheninnikov
Richard E. Turner
We show that language models' activations linearly encode when information was learned during training. Our setup involves creating a model … (see more)with a known training order by sequentially fine-tuning Llama-3.2-1B on six disjoint but otherwise similar datasets about named entities. We find that the average activations of test samples corresponding to the six training datasets encode the training order: when projected into a 2D subspace, these centroids are arranged exactly in the order of training and lie on a straight line. Further, we show that linear probes can accurately (~90%) distinguish "early" vs. "late" entities, generalizing to entities unseen during the probes' own training. The model can also be fine-tuned to explicitly report an unseen entity's training stage (~80% accuracy). Interestingly, the training-order encoding does not seem attributable to simple differences in activation magnitudes, losses, or model confidence. Our paper demonstrates that models are capable of differentiating information by its acquisition time, and carries significant implications for how they might manage conflicting data and respond to knowledge modifications.
RootletSeg: Deep learning method for spinal rootlets segmentation across MRI contrasts
Katerina Krejci
Jiri Chmelik
Sandrine B'edard
Falk Eippert
Ulrike Horn
Virginie Callot
Purpose: To develop a deep learning method for the automatic segmentation of spinal nerve rootlets on various MRI scans. Material and Method… (see more)s: This retrospective study included MRI scans from two open-access and one private dataset, consisting of 3D isotropic 3T TSE T2-weighted (T2w) and 7T MP2RAGE (T1-weighted [T1w] INV1 and INV2, and UNIT1) MRI scans. A deep learning model, RootletSeg, was developed to segment C2-T1 dorsal and ventral spinal rootlets. Training was performed on 76 scans and testing on 17 scans. The Dice score was used to compare the model performance with an existing open-source method. Spinal levels derived from RootletSeg segmentations were compared with vertebral levels defined by intervertebral discs using Bland-Altman analysis. Results: The RootletSeg model developed on 93 MRI scans from 50 healthy adults (mean age, 28.70 years
RootletSeg: Deep learning method for spinal rootlets segmentation across MRI contrasts
Katerina Krejci
Jiri Chmelik
Sandrine B'edard
Falk Eippert
Ulrike Horn
Virginie Callot
Purpose: To develop a deep learning method for the automatic segmentation of spinal nerve rootlets on various MRI scans. Material and Method… (see more)s: This retrospective study included MRI scans from two open-access and one private dataset, consisting of 3D isotropic 3T TSE T2-weighted (T2w) and 7T MP2RAGE (T1-weighted [T1w] INV1 and INV2, and UNIT1) MRI scans. A deep learning model, RootletSeg, was developed to segment C2-T1 dorsal and ventral spinal rootlets. Training was performed on 76 scans and testing on 17 scans. The Dice score was used to compare the model performance with an existing open-source method. Spinal levels derived from RootletSeg segmentations were compared with vertebral levels defined by intervertebral discs using Bland-Altman analysis. Results: The RootletSeg model developed on 93 MRI scans from 50 healthy adults (mean age, 28.70 years
Combining cortical and spinal stimulation maximizes the improvement of gait after spinal cord injury
Roxanne Drainville
Rose Guay-Hottin
Alexandre Sheasby
Marina Martinez
Most spinal cord injuries (SCI) spare descending motor pathways and sublesional networks, which can be activated through motor cortex and sp… (see more)inal cord stimulation to mitigate locomotor deficits. However, the potential synergy between cortical and spinal stimulation as a neuroprosthetic intervention remains unknown. Here, we first investigated phase-locked electrical stimulation of the motor cortex and lumbar spinal cord at 40 Hz in a rat model of unilateral SCI. Combining cortical and lumbar stimulation around the anticipated lift synergistically enhanced leg movements. When integrated into rehabilitation training, cortical stimulation proved essential for recovery of skilled locomotion. As a further refinement, we next investigated the effects of high-frequency (330 Hz) lumbar and sacral stimulation combined with cortical stimulation. Timely integration during the swing phase showed that cortical and rostral lumbar stimulations enhance the initial and mid-swing phases, while sacral stimulation improves extension velocity in the late swing. These findings indicate that supraspinal and sublesional neuromodulation offer complementary neuroprosthetic effects in targeted SCI gait rehabilitation. Highlights Cortical and spinal stimulations summate motor outputs via distinct pathways. Each improves gait post-SCI, but combined stimulation maximizes gait improvement. Integrating cortico-spinal stimulation into rehabilitation promotes lasting recovery. EES capabilities extended using high-frequency lumbosacral protocols.
GROOD: Gradient-Aware Out-of-Distribution Detection
Mostafa ElAraby
Yann Batiste Pequignot
Paul Novello
Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets
Recent advances in reasoning with large language models (LLMs) have demonstrated strong performance on complex mathematical tasks, including… (see more) combinatorial optimization. Techniques such as Chain-of-Thought and In-Context Learning have further enhanced this capability, making LLMs both powerful and accessible tools for a wide range of users, including non-experts. However, applying LLMs to matching problems, which require reasoning under preferential and structural constraints, remains underexplored. To address this gap, we introduce a novel benchmark of 369 instances of the College Admission Problem, a canonical example of a matching problem with preferences, to evaluate LLMs across key dimensions: feasibility, stability, and optimality. We employ this benchmark to assess the performance of several open-weight LLMs. Our results first reveal that while LLMs can satisfy certain constraints, they struggle to meet all evaluation criteria consistently. They also show that reasoning LLMs, like QwQ and GPT-oss, significantly outperform traditional models such as Llama, Qwen or Mistral, defined here as models used without any dedicated reasoning mechanisms. Moreover, we observed that LLMs reacted differently to the various prompting strategies tested, which include Chain-of-Thought, In-Context Learning and role-based prompting, with no prompt consistently offering the best performance. Finally, we report the performances from iterative prompting with auto-generated feedback and show that they are not monotonic; they can peak early and then significantly decline in later attempts. Overall, this work offers a new perspective on model reasoning performance and the effectiveness of prompting strategies in combinatorial optimization problems with preferential constraints.
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
Biological and artificial intelligence systems navigate the fundamental efficiency-robustness tradeoff for optimal encoding, i.e., they must… (see more) efficiently encode numerous attributes of the input space while also being robust to noise. This challenge is particularly evident in hierarchical processing systems like the human brain. With a view towards understanding how systems navigate the efficiency-robustness tradeoff, we turned to a population geometry framework for analyzing representations in the human visual cortex alongside artificial neural networks (ANNs). In the ventral visual stream, we found general-purpose, scale-free representations characterized by a power law-decaying eigenspectrum in most areas. However, in certain higher-order visual areas did not have scale-free representations, indicating that scale-free geometry is not a universal property of the brain. In parallel, ANNs trained with a self-supervised learning objective also exhibited free-free geometry, but not after fine-tune on a specific task. Based on these empirical results and our analytical insights, we posit that a system's representation geometry is not a universal property and instead depends upon the computational objective.
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
caskade: building Pythonic scientific simulators