Publications

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Usman Anwar
Julianna Piskorz
David D. Baek
David Africa
Jim Weatherall
Max Tegmark
Christian Schroeder de Witt
Mihaela van der Schaar
David Krueger
Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight … (voir plus)mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, **decision-theoretic view of steganography**. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents’ observable actions. To formalise this perspective, we introduce generalised
Is Depth Heterogeneity a Barrier to Model Merging?
Model merging offers a way to combine the capabilities of several networks at test time without retraining or additional finetuning, but mos… (voir plus)t merging methods assume identical architectures. Depth differences are commonly viewed as a major obstacle because they remove clear layer correspondences. We test this assumption by merging residual networks that differ only in depth, using a simple training-free pipeline based on identity expansion and permutation alignment. Across both same-task and multitask image classification experiments, heterogeneous merges closely match homogeneous ones. The results suggest that, for residual networks, depth mismatch is not the main barrier to effective model merging, and that the main difficulty in model merging comes from aligning independently trained weights in a homogeneous setting.
From Large-Scale Winds to Urban Decision Making: A Cross-Scale Framework for Wind-Aware UAV Navigation
Fuyuan Lyu
Di Zhou
Xue Liu
Xiongye Xiao
Anima Anandkumar
Liangzhu Leon Wang
Large-scale weather and climate models provide reliable wind information at regional scales, yet their outputs are typically too coarse for … (voir plus)direct UAV decision making in geometrically complex urban environments. This paper investigates how large-scale atmospheric information can be transformed into city-scale wind representations and utilized for downstream navigation decisions. We propose a cross-scale prediction and decision framework that takes background wind conditions from existing weather or climate models and combines them with detailed 3D urban geometry to predict time-averaged urban wind fields using a 3D neural operator. The predicted wind fields are then incorporated into a wind-aware UAV trajectory optimization problem to minimize energy consumption under kinematic feasibility and safety constraints. By comparing trajectories planned against a wind-agnostic baseline, we demonstrate significant efficiency gains enabled by AI-predicted wind, specifically 10.3% savings in tailwinds, 7.7% in headwinds, and 3.9% in crosswind conditions. These results indicate that learning decision-relevant urban wind representations offers a practical pathway for bridging large-scale atmospheric information and fine-scale urban decision making.
Machine learning–based prediction of Metabolic Syndrome risk in the Quebec population
Shayan Nejadshamsi
Stella S. Daskalopoulou
Samira Abbasgholizadeh Rahimi
Objective This study evaluates multiple machine learning approaches to predict metabolic syndrome (MetS) risk in the Quebec, Canada populati… (voir plus)on. We further perform explainability analysis to interpret model predictions and identify key features driving risk classification. Methods and analysis This study followed the Minimum Information about Clinical Artificial Intelligence Modeling (MI-CLAIM) guideline for reporting. We used cross-sectional data from the Canadian Community Health Survey (2015–2018) for the population living in the province of Quebec, which includes 42,279 participants. Partial sampling was used to obtain a balanced dataset for model development. We evaluated seven machine learning models for the defined classification task, including Logistic Regression, XGBoost, LightGBM, TabNet, NODE, 1D-CNN and Regularisation Cocktails. Performance was assessed using accuracy, precision, recall, F1-score, AUROC, and AUPRC, and interpretability was examined using SHAP to identify key predictors of MetS risk. Results After partial sampling, 7,866 participants (4,856 high-risk and 3,010 low-risk MetS cases) were included in the machine learning analysis. XGBoost and NODE showed the strongest performance. XGBoost achieved the highest accuracy (80.4%) and AUROC (84.1%), while NODE achieved the highest precision (80.1%) and AUPRC (86.0%). Explainability analysis identified age, perceived health, and sex as the most important features contributing to MetS risk predictions. Conclusion This study shows that machine learning can accurately predict MetS risk using self-reported health survey data from the Quebec population. Comparison of classical and deep learning approaches identified the optimal predictive model, and explainability analyses identified the most important features contributing to the risk predictions, which align with established clinical evidence. These results support a machine learning–driven initial screening framework for population-level early identification of high-risk individuals, enabling targeted interventions and efficient allocation of healthcare resources.
Objective Misalignment in LLM-based Multi Agent Social Deception Game
Large language model–based multi-agent systems have attracted increasing attention for their strong performance in collaborative tasks and… (voir plus) social simulations. However, these interactive settings also introduce vulnerabilities, as a single agent's hidden goals and misaligned behavior can propagate misleading or malicious information throughout the system. In this work, we study these risks in the context of social deception games. We focus on the Werewolf Game, which requires agents to reason, communicate, and collaborate under asymmetric and incomplete information. We modify the individual objectives of some agents to induce benevolent, individualistic, and malevolent strategies that can make agents depart from the objectives of their own team. We evaluate how objective divergence affects game outcomes, collaboration, and goal satisfaction. Misaligned agents often succeed in achieving their own objectives, with effects amplified by role-based power asymmetries. Qualitative analyses further show that agents remain coherent and adaptive, strategically adjusting their reasoning, communication, voting behavior, and influence on group dynamics. These results indicate that risks in LLM-based multi-agent systems extend beyond collaborative task settings and persist even in environments where competition is structurally expected.
Piezoelectric tuning of thermal conductivity in nano-architected gallium nitride metamaterials
Jun Cai
Alireza Seyedkanani
Benyamin Shahryari
Abdolhamid Akbarzadeh
PPO-CIS : A deep reinforcement learning framework for real-time toxicity detection in social media
Arezo Bodaghi
Benjamin C.M. Fung
Ketra A. Schmitt
Scalable Multi-Agent Reinforcement Learning Framework for Multi-Machine Tending
Abdalwhab Abdalwhab
David St-Onge
Robotic manipulators hold significant untapped potential for manufacturing industries, particularly when deployed in multi-robot configurati… (voir plus)ons that can enhance resource utilization, increase throughput, and reduce costs. However, industrial manipulators typically operate in isolated one-robot, one-machine setups, limiting both utilization and scalability. Even mobile robot implementations generally rely on centralized architectures, creating vulnerability to single points of failure and requiring robust communication infrastructure. This paper introduces SMAPPO (Scalable Multi-Agent Proximal Policy Optimization), a scalable input-size invariant multi-agent reinforcement learning model for decentralized multi-robot management in industrial environments. MAPPO (Multi-Agent Proximal Policy Optimization) represents the current state-of-the-art approach. We optimized an existing simulator to handle complex multi-agent reinforcement learning scenarios and designed a new multi-machine tending scenario for evaluation. Our novel observation encoder enables SMAPPO to handle varying numbers of agents, machines, and storage areas with minimal or no retraining. Results demonstrate SMAPPO's superior performance compared to the state-of-the-art MAPPO across multiple conditions: full retraining (up to 61% improvement), curriculum learning (up to 45% increased productivity and up to 49% fewer collisions), zero-shot generalization to significantly different scale scenarios (up to 272% better performance without retraining), and adaptability under extremely low initial training (up to 100% increase in parts delivery).
Semantic Anchor Transport: Robust Test-Time Adaptation for Vision-Language Models
Shambhavi Mishra
Julio Silva-Rodríguez
Ismail Ben Ayed
Jose Dolz
Large pre-trained vision-language models (VLMs) like CLIP exhibit strong zero-shot performance but struggle under distributional shifts. We … (voir plus)propose Semantic Anchor Transport (SAT), a method that generates pseudo-labels for test samples by aligning visual embeddings with reliable text-based semantic anchors using Optimal Transport for batch-wise label assignment. These pseudo-labels enable efficient test-time adaptation through principled cross-modal alignment. We further incorporate multi-template distillation to leverage diverse textual clues, replicating multi-view contrastive learning without added computational cost. Extensive experiments demonstrate consistent performance gains over state-of-the-art methods across multiple benchmarks while maintaining computational efficiency.
Street review: A participatory AI-based framework for assessing streetscape inclusivity
Shin Koseki
Urban centers undergo social, demographic, and cultural changes that shape public street use and require systematic evaluation of public spa… (voir plus)ces. This study presents Street Review, a mixed-methods approach that combines participatory research with AI-based analysis to assess streetscape inclusivity. In Montréal, Canada, 28 residents participated in semi-directed interviews and image evaluations, supported by the analysis of approximately 45,000 street-view images from Mapillary. The approach produced visual analytics, such as heatmaps, to correlate subjective user ratings with physical attributes like sidewalk, maintenance, greenery, and seating. Findings reveal variations in perceptions of inclusivity and accessibility across demographic groups, demonstrating that incorporating diverse user feedback can enhance machine learning models through careful data-labeling and co-production strategies. The Street Review framework offers a systematic method for urban planners and policy analysts to inform planning, policy development, and management of public streets.
<b>A Systematic Literature Review of Automated Feedback Generation in Education</b><b></b>
Yajie Song
Yimei Zhang
Feedback that is individualized and immediate is essential to improving learning outcomes but providing it to every learner is difficult. Au… (voir plus)tomatic feedback generation (AFG) aims to alleviate this problem, especially with technology-enhanced learning environments. This systematic literature review of AFG in education, following the PRISMA framework, examines 34 peer-reviewed publications. The findings revealed that the reviewed studies (1) gained momentum after 2019; (2) often used secondary cognitive data to evaluate AFG approaches; (3) mainly targeted computer science domain; (4) frequently combined multiple methods to generate feedback; (5) employed multiple performance evaluations; and (6) mostly provided written feedback aimed at correcting student errors. This review also highlighted several gaps, including the lack of (1) in-depth cognitive and affective data from user studies to evaluate feedback and understand how students interpret it; (2) research on feedback use and strategies to close feedback loop; (3) AFG systems for ill-defined domains with strong transferability; (4) elaborated feedback that scaffolds problem-solving rather than giving answers; (5) feedback using multiple modalities and valences; and (6) integration of learning theories in AFG design. This review advances understanding of current AFG practices, evaluates and extends conceptual frameworks of AFG, and provides insights for future AFG design and evaluation.
Understanding Representation Gaps across Scales in Tropical Tree Species Classification from Drone Imagery
Sulagna Saha
Evan M. Gora
Adriane Esquivel Muelbert
Ian R. McGregor
Cesar Gutierrez
Vanessa E. Rubio
Accurate classification of tropical tree species from unoccupied aerial vehicle (UAV) imagery remains challenging due to high species divers… (voir plus)ity and strong visual similarity among species at typical image resolutions (centimeters per pixel). In contrast, models trained on close-up citizen science photographs captured with smartphones achieve strong plant species classification performance. Recent advances in UAV data acquisition now enable the collection of close-up images that are spatially registered with top-view aerial imagery and approach the level of visual detail found in smartphone photographs, with the trade-off that such high-resolution photos cannot be acquired for many trees. In this work, we evaluate the performance of existing methods using paired top-view and close-up UAV imagery collected in a species-rich tropical forest. Through fine-tuning experiments, we quantify the performance gap between vision foundation models and in-domain generalist plant recognition models across both image types (high-resolution close-up versus coarser-resolution top-view imagery). We show that classification performance is consistently higher on close-up images than on top-view aerial imagery, and that this performance gap widens for rare species. Finally, we propose that self-supervised representation alignment across these two spatial scales offers a promising approach for integrating fine-grained visual information into canopy-level species classification models based on top-view UAV imagery. Leveraging high-resolution close-up UAV imagery to enhance canopy-level species classification could substantially improve large-scale monitoring of tropical forest biodiversity.