Publications

Robust and Interpretable Relational Reasoning with Large Language Models and Symbolic Solvers

Ge Zhang

Mohammad Alomrani

Hongjian Gu

Jiaming Zhou

Yaochen Hu

B. Wang

Qun Liu

Mark Coates

Yingxue Zhang

Jianye HAO

Large language models (LLMs) possess vast semantic knowledge but often struggle with complex reasoning tasks, particularly in relational rea… (see more)soning problems such as kinship or spatial reasoning. In this paper, we present Path-of-Thoughts (PoT), a novel framework designed to tackle relation reasoning by decomposing the task into three key stages: graph extraction, path identification, and reasoning. Unlike previous approaches, PoT efficiently extracts a task-agnostic graph that identifies crucial entities, relations, and attributes within the problem context. Subsequently, PoT identifies relevant reasoning chains within the graph corresponding to the posed question, facilitating inference of potential answers. Experimental evaluations on four benchmark datasets, demanding long reasoning chains, demonstrate that PoT surpasses state-of-the-art baselines by a significant margin (maximum 21.3\%) without necessitating fine-tuning or extensive LLM calls. Furthermore, as opposed to prior neuro-symbolic methods, PoT exhibits improved resilience against LLM errors by leveraging the compositional nature of graphs.

2025-06-30

ICML.cc/2025/Workshop/R2-FM (poster)

openreview.net

Cervical Spinal Cord Magnetization Transfer Ratio and Its Relationship With Clinical Outcomes in Multiple Sclerosis

Lisa Eunyoung Lee

Julien Cohen-Adad

Irene M. Vavasour

Melanie Guenette

Katherine Sawicka

Neda Rashidi‐Ranjbar

Nathan Churchill

Akash Chopra

Adelia Adelia

Pierre-Louis Benveniste

Anthony Traboulsee

Nathalie Arbour

Fabrizio Giuliani

Larry D. Lynd

Scott B. Patten

Alexandre Prat

Alice Schabas

Penelope Smyth

Roger Tam

Yunyan Zhang … (see 6 more)

Simon J. Graham

Mojgan Hodaie

Anthony Feinstein

Shannon Kolind

Tom A. Schweizer

Jiwon Oh

ABSTRACT Objective The cervical spinal cord (cSC) is highly relevant to clinical dysfunction in multiple sclerosis (MS) but remains understu… (see more)died using quantitative magnetic resonance imaging (MRI). We assessed magnetization transfer ratio (MTR), a semi‐quantitative MRI measure sensitive to MS‐related tissue microstructural changes, in the cSC and its relationship with clinical outcomes in radiologically isolated syndrome (RIS) and MS. Methods MTR data were acquired from 52 RIS, 201 relapsing–remitting MS (RRMS), 47 primary progressive MS (PPMS), and 43 control (CON) participants across four sites in the Canadian Prospective Cohort Study to Understand Progression in MS (CanProCo) using 3.0 T MRI systems. Mean MTR was compared between groups in whole cSC and sub‐regions between C2‐C4. Multiple linear regression was used to evaluate relationships between MTR and clinical outcomes, including the expanded disability status scale (EDSS), walking speed test (WST), and manual dexterity test (MDT). Results There were consistent group differences in MTR, which were most pronounced between PPMS and CON (−5.8% to −3.7%, p ≤ 0.01). In PPMS, lower MTR was associated with greater disability as measured by EDSS (β = −0.3 to −0.1, p ≤ 0.03), WST (β = −0.9 to −0.5, p ≤ 0.04), and MDT (β = −0.6 and − 0.5, p = 0.04). In RRMS, MTR was associated with only EDSS (β = −0.1, p ≤ 0.03). Interpretation In this large sample of RIS and MS, cSC MTR was lowest in PPMS, with associations between MTR and clinical outcomes in MS but not RIS. These findings suggest that MTR provides important information about the underlying tissue microstructural integrity of the cSC relevant to clinical disability in established MS.

2025-06-29

Annals of Clinical and Translational Neurology (published)

doi.org

Cervical Spinal Cord Magnetization Transfer Ratio and Its Relationship With Clinical Outcomes in Multiple Sclerosis

Lisa Eunyoung Lee

Julien Cohen-Adad

Irene M. Vavasour

Melanie Guenette

Katherine Sawicka

Neda Rashidi‐Ranjbar

Nathan Churchill

Akash Chopra

Adelia Adelia

Pierre-Louis Benveniste

Anthony Traboulsee

Nathalie Arbour

Fabrizio Giuliani

Larry D. Lynd

Scott B. Patten

Alexandre Prat

Alice Schabas

Penelope Smyth

Roger Tam

Yunyan Zhang … (see 6 more)

Simon J. Graham

Mojgan Hodaie

Anthony Feinstein

Shannon Kolind

Tom A. Schweizer

Jiwon Oh

2025-06-29

Annals of Clinical and Translational Neurology (published)

doi.org

Cervical Spinal Cord Magnetization Transfer Ratio and Its Relationship With Clinical Outcomes in Multiple Sclerosis

Lisa Eunyoung Lee

Julien Cohen-Adad

Irene M. Vavasour

Melanie Guenette

Katherine Sawicka

Neda Rashidi‐Ranjbar

Nathan Churchill

Akash Chopra

Adelia Adelia

Pierre-Louis Benveniste

Anthony Traboulsee

Nathalie Arbour

Fabrizio Giuliani

Larry D. Lynd

Scott B. Patten

Alexandre Prat

Alice Schabas

Penelope Smyth

Roger Tam

Yunyan Zhang … (see 6 more)

Simon J. Graham

Mojgan Hodaie

Anthony Feinstein

Shannon Kolind

Tom A. Schweizer

Jiwon Oh

2025-06-29

Annals of Clinical and Translational Neurology (published)

www.ncbi.nlm.nih.gov

Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification

Xing Shen

Justin Szeto

Mingyang Li

Hengguan Huang

Tal Arbel

Multimodal large language models (MLLMs) have enormous potential to perform few-shot in-context learning in the context of medical image ana… (see more)lysis. However, safe deployment of these models into real-world clinical practice requires an in-depth analysis of the accuracies of their predictions, and their associated calibration errors, particularly across different demographic subgroups. In this work, we present the first investigation into the calibration biases and demographic unfairness of MLLMs'predictions and confidence scores in few-shot in-context learning for medical image classification. We introduce CALIN, an inference-time calibration method designed to mitigate the associated biases. Specifically, CALIN estimates the amount of calibration needed, represented by calibration matrices, using a bi-level procedure: progressing from the population level to the subgroup level prior to inference. It then applies this estimation to calibrate the predicted confidence scores during inference. Experimental results on three medical imaging datasets: PAPILA for fundus image classification, HAM10000 for skin cancer classification, and MIMIC-CXR for chest X-ray classification demonstrate CALIN's effectiveness at ensuring fair confidence calibration in its prediction, while improving its overall prediction accuracies and exhibiting minimum fairness-utility trade-off. Our codebase can be found at https://github.com/xingbpshen/medical-calibration-fairness-mllm.

2025-06-29

ArXiv (preprint)

doi.org

arxiv.org

A Novel Sequential Framework for Transmission Network Expansion Planning: Benders Decomposition Preceding Semidefinite Programming

Elmira Fathipasandideh

Hussein Suprême

Hanane Dagdougui

Dalal Asber

The transmission network expansion planning (TNEP) problem is inherently complex because of its nonlinear and nonconvex nature, arising from… (see more) the inclusion of AC power flow constraints, discrete investment decisions, and multiple operating scenarios. These characteristics make the problem computationally challenging, particulary when scaling to larger systems with multistage planning horizons. Addressing this complexity requires advanced methodologies that balance the solution accuracy and computational efficiency. This paper presents a novel two-step framework for TNEP that first applies Benders decomposition to separate investment and operational decisions, followed by semidefinite linearization to reformulate the operational subproblems. The proposed approach enhances the solution quality by ensuring convexity in the subproblems and improves computational efficiency through decomposition. Numerical results for 6- , 10-, and 24-bus test systems demonstrate that the proposed method achieves superior performance compared to existing approaches in terms of solution accuracy and computational efficiency.

2025-06-29

2025 IEEE Kiel PowerTech (published)

doi.org

A Novel Sequential Framework for Transmission Network Expansion Planning: Benders Decomposition Preceding Semidefinite Programming

Elmira Fathipasandideh

Hussein Suprême

Hanane Dagdougui

Dalal Asber

The transmission network expansion planning (TNEP) problem is inherently complex because of its nonlinear and nonconvex nature, arising from… (see more) the inclusion of AC power flow constraints, discrete investment decisions, and multiple operating scenarios. These characteristics make the problem computationally challenging, particulary when scaling to larger systems with multistage planning horizons. Addressing this complexity requires advanced methodologies that balance the solution accuracy and computational efficiency. This paper presents a novel two-step framework for TNEP that first applies Benders decomposition to separate investment and operational decisions, followed by semidefinite linearization to reformulate the operational subproblems. The proposed approach enhances the solution quality by ensuring convexity in the subproblems and improves computational efficiency through decomposition. Numerical results for 6- , 10-, and 24-bus test systems demonstrate that the proposed method achieves superior performance compared to existing approaches in terms of solution accuracy and computational efficiency.

2025-06-29

2025 IEEE Kiel PowerTech (published)

doi.org

Small Encoders Can Rival Large Decoders in Detecting Groundedness

Istabrak Abbes

Gabriele Prato

Quentin Fournier

Fernando Rodriguez

Alaa Boukhary

Adam Elwood

Sarath Chandar

Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) … (see more)tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less

2025-06-26

ArXiv (preprint)

doi.org

arxiv.org

Small Encoders Can Rival Large Decoders in Detecting Groundedness

Istabrak Abbes

Gabriele Prato

Quentin Fournier

Fernando Rodriguez

Alaa Boukhary

Adam Elwood

Sarath Chandar

Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) … (see more)tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less

2025-06-26

ArXiv (preprint)

doi.org

arxiv.org

T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs

Alireza Dizaji

Benedict Aaron Tjandra

Mehrab Hamidi

Shenyang Huang

Guillaume Rabusseau

Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite… (see more) extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks (TGNNs) effectively capture core temporal patterns such as periodicity, cause-and-effect, and long-range dependencies. In this work, we introduce the Temporal Graph Reasoning Benchmark (T-GRAB), a comprehensive set of synthetic tasks designed to systematically probe the capabilities of TGNNs to reason across time. T-GRAB provides controlled, interpretable tasks that isolate key temporal skills: counting/memorizing periodic repetitions, inferring delayed causal effects, and capturing long-range dependencies over both spatial and temporal dimensions. We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns. Our findings offer actionable insights into the limitations of current models, highlight challenges hidden by traditional real-world benchmarks, and motivate the development of architectures with stronger temporal reasoning abilities. The code for T-GRAB can be found at: https://github.com/alirezadizaji/T-GRAB.

2025-06-26

KDD.org/2025/Workshop/MLoG-GenAI (oral)

doi.org

openreview.net

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models

Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering … (see more)methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, resulting in inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant, Diffusion Tree Search (DTS

2025-06-25

ArXiv (preprint)

doi.org

arxiv.org

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models

Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering … (see more)methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, resulting in inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant, Diffusion Tree Search (DTS

2025-06-25

ArXiv (preprint)

doi.org

arxiv.org

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Publications

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Popular keywords:

Publications