Publications

MLOps, LLMOps, FMOps, and Beyond
Chakkrit Kla Tantithamthavorn
Fabio Palomba
Joselito Joey Chua
Morphometric characteristics of tibial nerve and their relationship with age
Shahram Oveisgharan
Jingyun Yang
Sue E. Leurgans
Veronique VanderHorst
David A. Bennett
Osvaldo Delbono
Aron S. Buchman
Peripheral nerve comprises a crucial component of the distributed motor/sensory system. However, there is a paucity of data on peripheral ne… (see more)rve morphology derived from large numbers of older adults. This study aimed to quantify the morphometric characteristics of myelinated nerve fibres of the tibial nerve obtained from deceased community-dwelling older adults and examine their association with age. The tibial nerves were obtained from consecutive autopsies of older adults without a history of diabetes who were participants of the Rush Memory and Aging Project, an ongoing longitudinal clinical-autopsy study. A nerve fascicle, obtained from a fixed popliteal segment of the tibial nerve, was separated from the blood vessels and adipose tissue for postmortem examination under an optical microscope. Morphometric characteristics of the myelinated nerve fibres were automatically segmented and quantified using our open-source software AxonDeepSeg. The participants (N = 140) had a mean age of 92.0 years (SD = 5.4) at death, and 72.1% (N = 101) were women. We examined 754 247 myelinated nerve fibres, with an average 5387 (SD = 3436) nerve fibres per participant. The average diameter of myelinated nerve fibres was 4.9 µm (SD = 3.1), axon diameter was 2.0 µm (SD = 1.4), myelin thickness was 1.4 µm (SD = 0.96) and the g-ratio (ratio of axon diameter to myelinated nerve fibre diameter) was 0.45 (SD = 0.17). The relationship between axon diameter and myelin thickness was nonlinear. Myelin was thicker in larger axons up to a diameter of 8 µm, beyond which myelin thickness plateaued. Older age at death was associated with smaller myelinated nerve fibres, smaller axons and thinner myelin. However, age at death was not correlated with myelinated nerve fibre density and was not associated with the average of g-ratio. The association between older age and smaller myelinated nerve fibres was largely attributable to a lower percentage of myelinated nerve fibres >8 µm. We conclude that the smaller tibial myelinated nerve fibres observed in older adults may reflect axonal atrophy rather than degeneration and regeneration of the myelinated nerve fibres. Further research is needed to investigate the pathologies and molecular mechanisms underlying these age-related morphometric changes and their clinical implications in older adults.
Most German Speakers Ignore the Cue That Best Predicts Plural Class
Kate McCurdy
Timothy J. O'Donnell
Adam Lopez
Sharon Goldwater
Researchers generally assume that speakers use the linguistic information available to them. For instance, if one grammatical category robus… (see more)tly predicts another grammatical category, we expect speakers to reproduce this conditional relationship during language production. Here, we investigate this assumption for grammatical gender in German. Gender is the single cue which most strongly predicts the plural class of existing German nouns, but behavioral studies with novel nouns have found mixed results regarding the role of gender in plural generalization. Across three experiments, we examine how individual German speakers use grammatical gender when producing plural forms of novel nouns. We find that most speakers effectively ignore gender during plural class production, even under experimental manipulations that encourage them to attend to this cue. These results point toward an underexplored direction in cognitive science: accounting for the linguistic information that speakers do not use.
Multilingual Language Model Pretraining using Machine-translated Data
Jiayi Wang
Maurice Weber
Max Ryabinin
Yihong Chen
Raphael Tang
Pontus Stenetorp
Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor
Trevor Ablett
Oliver Limoyo
Adam Sigal
Jonathan Kelly
Francois Hogan
Kinesthetic Teaching is a popular approach to collecting expert robotic demonstrations of contact-rich tasks for imitation learning (IL), bu… (see more)t it typically only measures motion, ignoring the force placed on the environment by the robot. Furthermore, contact-rich tasks require accurate sensing of both reaching and touching, which can be difficult to provide with conventional sensing modalities. We address these challenges with a See-Through-your-Skin (STS) visuotactile sensor, using the sensor both (i) as a measurement tool to improve kinesthetic teaching, and (ii) as a policy input in contact-rich door manipulation tasks. An STS sensor can be switched between visual and tactile modes by leveraging a semi-transparent surface and controllable lighting, allowing for both pre-contact visual sensing and during-contact tactile sensing with a single sensor. First, we propose tactile force matching, a methodology that enables a robot to match forces read during kinesthetic teaching using tactile signals. Second, we develop a policy that controls STS mode switching, allowing a policy to learn the appropriate moment to switch an STS from its visual to its tactile mode. Finally, we study multiple observation configurations to compare and contrast the value of visual and tactile data from an STS with visual data from a wrist-mounted eye-in-hand camera. With over 3,000 test episodes from real-world manipulation experiments, we find that the inclusion of force matching raises average policy success rates by 62.5%, STS mode switching by 30.3%, and STS data as a policy input by 42.5%. Our results highlight the utility of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.
Multi-Robot Decentralized Collaborative SLAM in Planetary Analogue Environments: Dataset, Challenges, and Lessons Learned
Pierre-Yves Lajoie
Karthik Soma
Alice Lemieux-Bourque
Rongge Zhang
Vivek Shankar Varadharajan
Decentralized collaborative simultaneous localization and mapping (C-SLAM) is essential to enable multirobot missions in unknown environment… (see more)s without relying on preexisting localization and communication infrastructure. This technology is anticipated to play a key role in the exploration of the Moon, Mars, and other planets. In this article, we share insights and lessons learned from C-SLAM experiments involving three robots operating on a Mars analogue terrain and communicating over an ad hoc network. We examine the impact of limited and intermittent communication on C-SLAM performance, as well as the unique localization challenges posed by planetary-like environments. Additionally, we introduce a novel dataset collected during our experiments, which includes real-time peer-to-peer inter-robot throughput and latency measurements. This dataset aims to support future research on communication-constrained, decentralized multirobot operations.
NeoBERT: A Next-Generation BERT
Mariam El Mezouar
John Xavier Morris
A. Chandar
Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of … (see more)large auto-regressive language models such as LLaMA and DeepSeek. In contrast, encoders like BERT and RoBERTa have not seen the same level of progress despite being foundational for many downstream NLP applications. To bridge this gap, we introduce NeoBERT, a next-generation encoder that redefines the capabilities of bidirectional models by integrating state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. NeoBERT is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions. In addition, we rigorously evaluate the impact of each modification on GLUE and design a uniform fine-tuning and evaluation framework for MTEB. We release all code, data, checkpoints, and training scripts to accelerate research and real-world adoption.
Nteasee: A mixed methods study of expert and general population perspectives on deploying AI for health in African countries
Mercy Nyamewaa Asiedu
Iskandar Haykel
Awa Dieng
K. Kauer
Tousif Ahmed
Florence Ofori
Charisma Chan
Stephen R. Pfohl
Katherine Heller
Online Influence Campaigns: Strategies and Vulnerabilities
Ethan Kosak-Hine
Tom Gibbs
U. Montr'eal
Ivado
M. University
In order to combat the creation and spread of harmful content online, this paper defines and contextualizes the concept of inauthentic, soci… (see more)etal-scale manipulation by malicious actors. We review the literature on societally harmful content and how it proliferates to analyze the manipulation strategies used by such actors and the vulnerabilities they target. We also provide an overview of three case studies of extensive manipulation campaigns to emphasize the severity of the problem. We then address the role that Artificial Intelligence plays in the development and dissemination of harmful content, and how its evolution presents new threats to societal cohesion for countries across the globe. Our survey aims to increase our understanding of not just particular aspects of these threats, but also the strategies underlying their deployment, so we can effectively prepare for the evolving cybersecurity landscape.
Open Technical Problems in Open-Weight AI Model Risk Management
Stephen Casper
Kyle O'Brien
Shayne Longpre
Elizabeth Seger
Kevin Klyman
Rishi Bommasani
Aniruddha Nrusimha
Ilia Shumailov
Sören Mindermann
Steven Basart
Frank Rudzicz
Avijit Ghosh
Andrew Strait
Robert Kirk
Dan Hendrycks
J. Zico Kolter
Geoffrey Irving
Yarin Gal … (see 2 more)
Dylan Hadfield-Menell
OpenFake: An Open Dataset and Platform Toward Real-World Deepfake Detection
Deepfakes, synthetic media created using advanced AI techniques, pose a growing threat to information integrity, particularly in politically… (see more) sensitive contexts. This challenge is amplified by the increasing realism of modern generative models, which our human perception study confirms are often indistinguishable from real images. Yet, existing deepfake detection benchmarks rely on outdated generators or narrowly scoped datasets (e.g., single-face imagery), limiting their utility for real-world detection. To address these gaps, we present OpenFake, a large politically grounded dataset specifically crafted for benchmarking against modern generative models with high realism, and designed to remain extensible through an innovative crowdsourced adversarial platform that continually integrates new hard examples. OpenFake comprises nearly four million total images: three million real images paired with descriptive captions and almost one million synthetic counterparts from state-of-the-art proprietary and open-source models. Detectors trained on OpenFake achieve near-perfect in-distribution performance, strong generalization to unseen generators, and high accuracy on a curated in-the-wild social media test set, significantly outperforming models trained on existing datasets. Overall, we demonstrate that with high-quality and continually updated benchmarks, automatic deepfake detection is both feasible and effective in real-world settings.
PairBench: Are Vision-Language Models Reliable at Comparing What They See?
Sai Rajeswar
Adriana Romero
Valentina Zantedeschi
Joao Monteiro
Understanding how effectively large vision language models (VLMs) compare visual inputs is crucial across numerous applications, yet this fu… (see more)ndamental capability remains insufficiently assessed. While VLMs are increasingly deployed for tasks requiring comparative judgment, including automated evaluation, re-ranking, and retrieval-augmented generation, no systematic framework exists to measure their performance in these scenarios. We present PairBench, a simple framework that evaluates VLMs as customizable similarity tools using widely available image datasets. Our approach introduces four key metrics for reliable comparison: alignment with human annotations, consistency across pair ordering, distribution smoothness, and controllability through prompting. Our analysis reveals that no model consistently excels across all metrics, with each demonstrating distinct strengths and weaknesses. Most concerning is the widespread inability of VLMs to maintain symmetric similarity scores. Interestingly, we demonstrate that performance on our benchmark strongly correlates with popular benchmarks used for more complex tasks, while providing additional metrics into controllability, smoothness and ordering. This makes PairBench a unique and comprehensive framework to evaluate the performance of VLMs for automatic evaluation depending on the task.