Publications

Grounding Computer Use Agents on Human Demonstrations
Xiangru Jian
Kevin Qinghong Lin
Kaixin Li
Johan Obando-Ceron
Juan A. Rodriguez
Adriana Romero-Soriano
Christopher Pal
Sai Rajeswar
Building reliable computer-use agents requires grounding: accurately connecting natural language instructions to the correct on-screen eleme… (voir plus)nts. While large datasets exist for web and mobile interactions, high-quality resources for desktop environments are limited. To address this gap, we introduce GroundCUA, a large-scale desktop grounding dataset built from expert human demonstrations. It covers 87 applications across 12 categories and includes 56K screenshots, with every on-screen element carefully annotated for a total of over 3.56M human-verified annotations. From these demonstrations, we generate diverse instructions that capture a wide range of real-world tasks, providing high-quality data for model training. Using GroundCUA, we develop the GroundNext family of models that map instructions to their target UI elements. At both 3B and 7B scales, GroundNext achieves state-of-the-art results across five benchmarks using supervised fine-tuning, while requiring less than one-tenth the training data of prior work. Reinforcement learning post-training further improves performance, and when evaluated in an agentic setting on the OSWorld benchmark using o3 as planner, GroundNext attains comparable or superior results to models trained with substantially more data,. These results demonstrate the critical role of high-quality, expert-driven datasets in advancing general-purpose computer-use agents.
Understanding the role of depth in the neural tangent kernel for overparameterized neural networks
Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigeria's Minority Languages
Oluwadara Kalejaiye
Mmekut-Mfon Gabriel Edet
A. D. Akpan
Eno-Abasi Urua
Anietie U Andy
Blind Strong Gravitational Lensing Inversion: Joint Inference of Source and Lens Mass with Score-Based Models
LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems
Modeling user preferences across domains remains a key challenge in slate recommendation (i.e. recommending an ordered sequence of items) re… (voir plus)search. We investigate how Large Language Models (LLM) can effectively act as world models of user preferences through pairwise reasoning over slates. We conduct an empirical study involving several LLMs on three tasks spanning different datasets. Our results reveal relationships between task performance and properties of the preference function captured by LLMs, hinting towards areas for improvement and highlighting the potential of LLMs as world models in recommender systems.
LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing
Bram Bulté
Large Language Models (LLMs) are rapidly being adopted by users across the globe, who interact with them in a diverse range of languages. At… (voir plus) the same time, there are well-documented imbalances in the training data and optimisation objectives of this technology, raising doubts as to whether LLMs can represent the cultural diversity of their broad user base. In this study, we look at LLMs and cultural values and examine how prompt language and cultural framing influence model responses and their alignment with human values in different countries. We probe 10 LLMs with 63 items from the Hofstede Values Survey Module and World Values Survey, translated into 11 languages, and formulated as prompts with and without different explicit cultural perspectives. Our study confirms that both prompt language and cultural perspective produce variation in LLM outputs, but with an important caveat: While targeted prompting can, to a certain extent, steer LLM responses in the direction of the predominant values of the corresponding countries, it does not overcome the models' systematic bias toward the values associated with a restricted set of countries in our dataset: the Netherlands, Germany, the US, and Japan. All tested models, regardless of their origin, exhibit remarkably similar patterns: They produce fairly neutral responses on most topics, with selective progressive stances on issues such as social tolerance. Alignment with cultural values of human respondents is improved more with an explicit cultural perspective than with a targeted prompt language. Unexpectedly, combining both approaches is no more effective than cultural framing with an English prompt. These findings reveal that LLMs occupy an uncomfortable middle ground: They are responsive enough to changes in prompts to produce variation, but too firmly anchored to specific cultural defaults to adequately represent cultural diversity.
Medication-based mortality prediction in COPD using machine learning and conventional statistical methods.
Ana Paula Pena-Gralle
Amélie Forget
Yohann Chiu
M. Beauchesne
Lucie Blais
Oneiris: An AI-augmented Brain-Computer Interface for Exploring Personal and Collective Dreamscapes
Antoine Bellemare‐Pepin
Karim Jerbi CoCo Lab
Suzanne Kite
RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring
Why Less is More (Sometimes): A Theory of Data Curation
Elvis Dopgima Dohmatob
DeLLMphi: A Multi-Turn Method for Multi-Agent Forecasting
Andrew Robert Williams
Victoria Feere
Nasim Rahaman
The Delphi method is a structured forecasting process that engages experts in iterative prediction and reflection. Each round, experts submi… (voir plus)t forecasts to a mediator, receive an aggregated and synthesized response highlighting key arguments, and update their forecasts based on collective insight. However, Delphi panels are labour intensive, slow and hard to reproduce, requiring diverse knowledgeable participants to engage periodically across weeks or months. To address these constraints, we propose **DeLLMphi**, a forecasting method that replaces human experts and mediators with LLMs. We show (i) that providing example superforecaster reasoning traces and predictions helps to elicit more accurate forecasts from LLM experts, (ii) that the mediator plays the crucial role of surfacing different lines of reasoning and points of disagreement, and (iii) that multiple rounds and experts lead to better forecasts, showing that multi-turn interaction is key to DeLLMphi.
Longitudinal functional connectivity during rest and task is differentially related to Alzheimer's pathology and episodic memory in older adults
Larissa Fischer
Jenna N. Adams
Eóin N. Molloy
Jennifer Tremblay-Mercier
Jordana Remz
Alexa Pichet Binette
M. Natasha Rajah
Sylvia Villeneuve
Anne Maass
PREVENT-AD Research Group
Changes in functional connectivity (FC) strength involving the medial temporal lobe (MTL) and posteromedial cortex (PMC) are related to earl… (voir plus)y Alzheimer’s pathology and alterations in episodic memory performance in cognitively unimpaired older adults, but their dynamics remain unclear. We examined how longitudinal changes in FC involving MTL and PMC during resting-state, episodic memory encoding, and retrieval relate to subsequent amyloid- and tau-PET burden, longitudinal episodic memory performance, and the APOE4 genotype in 152 cognitively unimpaired older adults from the PREVENT-AD cohort. We found APOE4- and fMRI paradigm-dependent associations of change in FC strength with pathology burden and change in episodic memory performance. Decreasing FC over time, or “hypoconnectivity”, within PMC during rest in APOE4 carriers and during retrieval in APOE4 non-carriers was related to more amyloid and tau, respectively. Conversely, increasing FC over time, or “hyperconnectivity”, within MTL during encoding in APOE4 carriers and between MTL and PMC during retrieval independent of APOE4 status was related to more tau. Further, increasing FC between MTL and PMC during rest, unlike during encoding, was beneficial for episodic memory. Our study highlights that pathology-related episodic memory network changes manifest differently during rest and task and have differential implications for episodic memory trajectories. The online version contains supplementary material available at 10.1038/s41598-025-21596-0.