Publications

Empowering 2D neural network for 3D medical image segmentation via neighborhood information fusion

Qiankun Li

Xiaolong Huang

Yani Zhang

Bo Fang

Duo Hong

Junxin Chen

2026-06-30

Pattern Recognition (published)

doi.org

Wagg: Cost-aware Aggregation of Windowing Operators in Stream Processing

Pritish Mishra

Ruoyu Deng

Alexandre da Silva Veith

Oana Balmau

Eyal de Lara

2026-06-19

International Conference on Mobile Systems, Applications and Services (published)

doi.org

The digital heartbeat: a qualitative descriptive study on women's views on preventing cardiovascular disease in primary care

Ilhem Chaima Bousbiat

Samira Abbasgholizadeh Rahimi

Roland Grad

Charo Rodriguez

BACKGROUND: This empirical study aims to explore women's perspectives on cardiovascular disease and the use of digital health interventions … (see more)(DHIs) for their primary prevention and to gather insights on essential features for developing artificial intelligent-enabled technologies. METHODS: Adopting a qualitative descriptive research design, we conducted 15 semi-structured, in-depth interviews via Zoom with women at higher risk for cardiovascular disease. Participants were women over 40 years old, residing in Quebec, with at least one cardiovascular disease risk factor, and proficient in English. Recruitment was from a McGill University-affiliated clinic. An inductive thematic analysis approach was used for data analysis. RESULTS: Five major themes were identified: (i) understanding cardiovascular disease in a variety of ways, (ii) barriers and challenges to preventing cardiovascular disease in women, (iii) women taking charge of their cardiovascular well-being, (iv) mixed perspectives regarding artificial intelligent-enabled technologies for cardiovascular disease prevention such as Xi-Care, and (v) range of suggestions for the format and design of a prospective artificial intelligent-enabled technologies. CONCLUSIONS: Despite the prevalence of cardiovascular disease, there is a significant knowledge gap among women regarding the chronic nature and manifestations of these diseases. Artificial intelligent-enabled technologies like Xi-Care, with the potential for customization and interactive engagement, could enhance the primary prevention of cardiovascular disease in women, providing valuable insights for the subsequent phases of the project leading to Xi-Care's development.

2026-06-18

Family Practice (published)

doi.org

Engineered Nonheme Iron Enzymes Enable Asymmetric Hydrogenation of Alkenes

Yunfei He

Shuang-Yu Dai

Mei‐Yan Xu

Baixu Ma

Jian Tang

Lizhi Tao

Zhen Liu

Developing biocatalytic systems capable of reducing simple alkenes is highly desirable for synthetic chemistry and biosynthesis, yet existin… (see more)g enzymes remain largely restricted to their ability to convert polarized, electron-deficient substrates. Here, we present a nonheme iron metalloenzyme platform that enables hydrogenation of styrenes, conjugated nitriles and amides, and nonconjugated olefins through a putative iron–hydride mechanism. Starting from the Fe(II)/ α -ketoglutarate-dependent dioxygenase GOX, iterative rounds of directed evolution produced an engineered “alkene hydrogenase” (AHase-6) containing 16 mutations and promoting NaBH 4 -driven reduction across diverse C═C bond motifs. Kinetic analysis indicates that this enzymatic hydrogenation process proceeds via formation of an enzyme–substrate ternary complex through a sequential mechanism. Mechanistic studies further reveal that alkene insertion occurs with regioselectivity governed primarily by substrate electronics and sterics. These findings establish nonheme iron enzymes as an unrecognized scaffold for metal–hydride-based hydrogenation and highlight their potential as sustainable, tunable alternatives to traditional catalytic systems.

2026-06-12

Journal of the American Chemical Society (published)

doi.org

Artificial intelligence-assisted ganglion cell detection in Hirschsprung's disease: A comparative evaluation of two deep learning approaches

E Wang

Karl Grenier

Peter Savadjiev

Dan Poenaru

Background. Definitive diagnosis of Hirschsprung's disease (HD) requires pathological identification of enteric ganglion cells. This process… (see more) is time-consuming and subject to inter-observer variability. Artificial intelligence (AI) tools have the potential to standardize and accelerate this workflow, but no study has determined which AI approach best serves intraoperative HD pathology diagnostics. Method. This study compared the U-Net and You Only Look Once version 26 (YOLO26) frameworks for ganglion cell detection using a single-centre retrospective dataset of 54 whole-slide images (WSIs) from rectal biopsies. WSIs were tiled into 397,731 image patches (128x128 pixels), further partitioned into training (70%), validation (15%), and testing (15%) sets. Models were evaluated on tile- and patient-level diagnostic metrics and processing latency. Results. The U-Net achieved a tile-level sensitivity of 82.9%, showing no statistically significant difference compared to YOLO26 (79.1%; p = 0.097). However, YOLO26 demonstrated a statistically significant advantage in tile-level specificity (96.1% vs. 93.9%; p < 0.001) and reduced mean inference latency (7.64 ms vs. 11.57 ms/tile). At the patient level, both models achieved 100% diagnostic sensitivity. Despite low patient-level specificity (0.0% U-Net; 11.8% YOLO26), the tissue-level diagnostic burden of false positives was 6.00% for U-Net and 3.50% for YOLO26. Conclusion. The U-Net is preferred when nominal gains in sensitivity are prioritized, while the YOLO26 is an alternative that optimizes efficiency and false positive suppression. Both models serve as robust screening filters to augment the pathologist's workflow and should be selected based on workflow requirements. Prospective validation on larger, multi-centre datasets is required before clinical implementation.

2026-06-11

medRxiv (accepted)

doi.org

Characterizing Cultural Localization in AI-Generated Stories

Shaily Bhatt

Supriti Vijay

Jeremiah Milbauer

Fernando Diaz

The global use of artificial intelligence has increased interest in assessing the ability to generate culturally localized content, includin… (see more)g stories. Cultural localization in stories often occurs through either templated localization -- the use of cultural markers (e.g., names, locations) in a generic narrative -- or holistic localization -- the variation of plots, values, and themes, in addition to cultural markers. We propose a method to measure the degree to which content was generated through templated localization. Specifically, we identify the lexical tokens that distinguish stories across nationalities and measure the similarity of the narratives that remain after removing them. In stories generated by five models on 125 topics for 193 nationalities, our method is able to detect that only a small subset (9-17%) of the vocabulary accounts for the variation across nationalities and that the narratives that remain after removing them contain repeated multi-word sequences, suggesting the presence of a shared culturally-agnostic narrative template. Finally, we characterize the cultural markers for their stereotypicality and offensiveness, finding that markers from 19 countries, mostly located in the Global South, are on average offensive.

2026-06-11

arXiv (preprint)

doi.org

arxiv.org

AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility

Xiaoyuan Liu

Jianhong Tu

Yuqi Chen

Siyuan Xie

Sihan Ren

Tianneng Shi

Gal Gantar

Evan Sandoval

D Lee

Daniel Miao

Peter J. Gilbert

Nick Hynes

Mauro Staver

Warren He

David Marn

Andrew Low

Xi Zhang

Elron Bandel

Michal Shmueli-Scheuer

Somasekhar Reddy … (see 9 more)

Alexandre Drouin

Alexandre Lacoste

R Radha Krishnan

Elham Tabassi

Yu Su

Victor Barres

Chenguang Wang

Wenbo Guo

Dawn Song

Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harn… (see more)esses that require heavy integration, create test-production mismatch, and limit fair comparison across diverse agent designs. The root problem is the lack of an open, agent-agnostic assessment interface. We advocate Agentified Agent Assessment (AAA), where evaluation is performed by judge agents and all participants interact through standardized protocols: A2A for task management and MCP for tool access. Conventional benchmarking defines two separate interfaces, one for the benchmark and one for the agent, while AAA only needs one; this yields a generic, unified framework that separates assessment logic from agent implementation and enables reproducible, interoperable, and multi-agent evaluation. We further introduce AgentBeats as a concrete realization of AAA: we identify five practical operation modes that make standardized assessment compatible with real-world constraints on openness, privacy, and reproducibility. To evaluate our design at scale, we conduct two studies: a five-month open competition that drew 298 judge agents across 12 categories together with 467 subject agents from independent participants, showing that AAA applies across a heterogeneous range of benchmarks; and a case study on coding agents that confirms agentified evaluation preserves fidelity with the public record while surfacing previously missing head-to-head results, yielding research insights about agent design. Combining a community-scale field study and a controlled coding case study, we verify that AAA delivers coverage, practicality, and fidelity across heterogeneous scenarios at scale. Together, AAA and AgentBeats offer a clear path toward open, standardized, and reproducible agent assessment.

2026-06-10

arXiv (preprint)

doi.org

arxiv.org

Data-Driven Stochastic Vehicle Routing Problems with Deadlines Under Decision-Dependent Travel Time

Shanshan Wang

Erick Delage

Leandro C. Coelho

Problem definition: Vehicle routing problems (VRPs) with deadlines have received significant attention around the world. Motivated by a real… (see more)-world food delivery problem, we assume that the travel time depends on the routing decisions, and we study a data-driven stochastic VRP with deadlines and endogenous uncertainty. Methodology/results: We use the nonparametric approaches, including k-nearest neighbor (kNN) and kernel density estimation (KDE), to estimate the decision-dependent probability distribution of travel time. To solve the resulting problem efficiently, we employ a logic-based Benders decomposition (LBBD) algorithm with several algorithmic enhancements. In particular, we propose a novel family of optimality cuts that includes the expected delay for all the subroutes. Moreover, we solve a total travel cost minimization problem to warm start the algorithm. We also use a local search procedure to improve the current routing decision and propose a machine learning–based lower bound heuristic to efficiently solve problems of realistic size. A practical case study for a food delivery routing problem using real-world data is conducted to show the efficiency of the proposed techniques and the advantage of the data-driven stochastic VRP in reducing the expected delay. Managerial implications: In our case study, we show that incorporating routing decisions into a nonparametric model outperforms a state-of-the-art data-driven parametric model by 23% on average in terms of the expected delay and the order-assignment decisions obtained from a robust model with travel-time predictors by 26% on average. Moreover, compared with the drivers’ actual routes and arc-based VRP models that ignore the endogenous uncertainty, our suggested routes can significantly improve the on-time performance of delivery services. We also quantify the value of the proposed routes with different service deadlines. Funding: S. Wang was partially supported by the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2016-05208], IVADO, and a joint project between the Fonds de Recherche du Québec - Société et Culture (FRQSC) and the National Natural Science Foundation of China (NSFC) [Grant 295837]. She is also most recently supported by the National Natural Science Foundation of China [Grants 72501014, 72371022, and 72272014]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2024.0899 .

2026-06-10

Manufacturing & Service Operations Management (published)

doi.org

Feature Geometry of Language Models Transfer Across Modalities to Time Series

Zhenghan Tai

Vasilii Feofanov

Language models transfer to time-series forecasting, but it is unclear whether this reflects reusable internal structure or rapid relearning… (see more) under a familiar architecture. We study this transfer directly by comparing pretrained and randomly initialized versions of the same model on a forecasting objective whose inputs have little semantic overlap with text but still require autoregressive sequential structure. Across Qwen3-0.6B finetuning experiments, language initialization gives coherent per-example gradients from the first update, while random initialization first passes through a low-alignment warmup phase. Effective-rank and hidden-state analyses show that finetuning selectively reshapes an existing representation geometry rather than constructing the simpler temporal geometry found by models trained from scratch. Cross-domain sparse features and causal ablations then expose candidate transferred primitives, including a Layer~1 head--MLP circuit whose ablation selectively increases loss on periodic forecasting and repetitive language passages. These results support an account of cross-modal transfer in which autoregressive pretraining creates temporal feature geometry that can be selected and specialized outside language.

2026-06-10

ICML.cc/2026/Workshop/Mech_Interp (poster)

openreview.net

Forecasting Emerges from Auto-Regressive Pretraining: Latent Predictive Structure in Language Models

Zhenghan Tai

Vasilii Feofanov

Predicting how a sequence will continue is a basic problem for intelligent systems. We show that large language models contain usable foreca… (see more)sting structure before any explicit time-series supervision. A single linear readout from frozen Qwen3-0.6B hidden states maps ordinary text sequences to numerical trajectories that resemble real time series, and those trajectories can be used for straightforward forecasts. The distribution over output tokens also gives coherent, non-crossing probabilistic forecasts in a single forward pass. After time-series specialization, pretrained models show aligned gradients and improve immediately, whereas randomly initialized models spend early training in a destructive-interference regime. These findings suggest that auto-regressive pretraining already shapes representations around temporal continuation; and finetuning adapts that structure to numerical forecasting rather than creating it from scratch.

2026-06-10

ICML.cc/2026/Workshop/Forecast (oral)

openreview.net

A Functional Approach to Synthesizing Routable Programmable Accelerators for Neural Networks

Tzung-Han Juang

Paul Teng

Christophe Dubach

T V Paul

Producing optimized accelerators is tedious, as even modern HDLs (Hardware Description Languages) such as Chisel, require reasoning about lo… (see more)w-level concepts. Recent functional approaches, such as Aetherling and SHIR, treat hardware as composition of pure operators. This raises the abstraction level, allowing for systematic optimizations through rewriterules for FPGAs (Field Programmable Gate Arrays).

2026-06-10

Proceedings of the 27th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (published)

doi.org

Hidden-State Similarity Predicts Re-Elicitation After Inoculation Prompting

Fine-tuning on narrow harmful tasks can cause emergent misalignment, where models generalize harmful behavior beyond the training distributi… (see more)on. Inoculation prompting can reduce this effect by explicitly eliciting the undesired behavior during training, but recent work shows that the behavior can reappear when evaluation prompts contain cues from the training context. We study what makes such prompts effective triggers. We find that textual similarity to the inoculation prompt is an incomplete predictor: prompts are more likely to re-elicit suppressed behavior when they induce activation states similar to those produced by the inoculation context. These findings advance our understanding of how inoculation prompting modulates conditional misalignment, and suggest that activation-space analysis can help identify when suppressed behaviors remain accessible under eval-time prompts.

2026-06-10

ICML.cc/2026/Workshop/Mech_Interp (poster)

openreview.net

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Publications