Publications

Using machine learning algorithms to predict students' general self-efficacy in PISA 2018

Bin Tan

Hao-Yue Jin

Maria Cutumisu

2025-06-30

Journal of Applied Developmental Psychology (published)

doi.org

What Matters for Maximizing Data Reuse In Value-based Deep Reinforcement Learning

Roger Creus Castanyer

Glen Berseth

Pablo Samuel Castro

A key ingredient for successfully applying deep reinforcement learning to challenging tasks is the effective use of data at scale. Although … (see more)originally deep RL algorithms achieved this by storing past experiences collected from a synchronous actor in an external replay memory [DQN; Mnih et al., 2013], follow-up works scaled training by collecting data asynchronously through distributed actors [R2D2; Kapturowski et al., 2018], and more recently by GPU-optimized parallelization [PQN; Gallici et al., 2024]. We argue that DQN, PQN, and R2D2 constitute a group of value-based methods for parallel training and study them to shed light on the dynamics induced by varying data collection schemes. We conduct a thorough empirical study to better understand these dynamics, and propose the Data Replay Ratio as a novel metric for quantifying data reuse. Our findings suggest that maximizing data reuse involves directly addressing the deadly triad: Q-lambda rollouts for reducing the bias from bootstrapping, the use of LayerNorm for stabilizing function approximation, and parallelized data collection for mitigating off-policy divergence.

2025-06-30

rl-conference.cc/RLC/2025/Workshop/Finding_the_Frame (published)

openreview.net

Zero-Shot Constraint Satisfaction with Forward- Backward Representations

Adriana Hugessen

Harley Wiltzer

Cyrus Neary

Amy Zhang

Glen Berseth

Traditionally, constrained policy optimization with Reinforcement Learning (RL) requires learning a new policy from scratch for any new envi… (see more)ronment, goal or cost function, with limited generalization to new tasks and constraints. Given the sample inefficiency of many common deep RL methods, this procedure can be impractical for many real-world scenarios, particularly when constraints or tasks are changing. As an alternative, in the unconstrained setting, various works have sought to pre-train representations from offline datasets to accelerate policy optimization upon specification of a reward. Such methods can permit faster adaptation to new tasks in a given environment, dramatically improving sample efficiency. Recently, zero-shot policy optimization has been explored by leveraging a particular

2025-06-30

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

openreview.net

Circuit Discovery Helps To Detect LLM Jailbreaking

Despite extensive safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safeguards to elicit har… (see more)mful content. While prior work attributes this vulnerability to safety training limitations, the internal mechanisms by which LLMs process adversarial prompts remain poorly understood. We present a mechanistic analysis of the jailbreaking behavior in a large-scale, safety-aligned LLM, focusing on LLaMA-2-7B-chat-hf. Leveraging edge attribution patching and subnetwork probing, we systematically identify computational circuits responsible for generating affirmative responses to jailbreak prompts. Ablating these circuits during the first token prediction can reduce attack success rates by up to 80\%, demonstrating its critical role in safety bypass. Our analysis uncovers key attention heads and MLP pathways that mediate adversarial prompt exploitation, revealing how important tokens propagate through these components to override safety constraints. These findings advance the understanding of adversarial vulnerabilities in aligned LLMs and pave the way for targeted, interpretable defenses mechanisms based on mechanistic interpretability.

2025-06-29

ICML.cc/2025/Workshop/R2-FM (poster)

openreview.net

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

Zhanke Zhou

Zhaocheng Zhu

Xuan Li

Mikhail Galkin

Xiao Feng

Sanmi Koyejo

Jian Tang

Bo Han

2025-06-29

ICML.cc/2025/Workshop/R2-FM (poster)

doi.org

openreview.net

Natural language processing for African languages

David Ifeoluwa Adelani

Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performanc… (see more)e. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in NLP research, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.

2025-06-29

ArXiv (preprint)

doi.org

arxiv.org

Robust and Interpretable Relational Reasoning with Large Language Models and Symbolic Solvers

Ge Zhang

Mohammad Alomrani

Hongjian Gu

Jiaming Zhou

Yaochen Hu

B. Wang

Qun Liu

Mark J. Coates

Yingxue Zhang

Jianye HAO

Large language models (LLMs) possess vast semantic knowledge but often struggle with complex reasoning tasks, particularly in relational rea… (see more)soning problems such as kinship or spatial reasoning. In this paper, we present Path-of-Thoughts (PoT), a novel framework designed to tackle relation reasoning by decomposing the task into three key stages: graph extraction, path identification, and reasoning. Unlike previous approaches, PoT efficiently extracts a task-agnostic graph that identifies crucial entities, relations, and attributes within the problem context. Subsequently, PoT identifies relevant reasoning chains within the graph corresponding to the posed question, facilitating inference of potential answers. Experimental evaluations on four benchmark datasets, demanding long reasoning chains, demonstrate that PoT surpasses state-of-the-art baselines by a significant margin (maximum 21.3\%) without necessitating fine-tuning or extensive LLM calls. Furthermore, as opposed to prior neuro-symbolic methods, PoT exhibits improved resilience against LLM errors by leveraging the compositional nature of graphs.

2025-06-29

ICML.cc/2025/Workshop/R2-FM (poster)

openreview.net

Cervical Spinal Cord Magnetization Transfer Ratio and Its Relationship With Clinical Outcomes in Multiple Sclerosis

Lisa Eunyoung Lee

Julien Cohen‐Adad

Irene M. Vavasour

Melanie Guenette

Katherine Sawicka

Neda Rashidi‐Ranjbar

Nathan Churchill

Akash Chopra

Adelia Adelia

Pierre‐Louis Benveniste

Anthony Traboulsee

Nathalie Arbour

Fabrizio Giuliani

Larry D. Lynd

Scott B. Patten

Alexandre Prat

Alice Schabas

Penelope Smyth

Roger Tam

Yunyan Zhang … (see 6 more)

Simon J. Graham

Mojgan Hodaie

Anthony Feinstein

Shannon Kolind

Tom A. Schweizer

Jiwon Oh

The cervical spinal cord (cSC) is highly relevant to clinical dysfunction in multiple sclerosis (MS) but remains understudied using quantita… (see more)tive magnetic resonance imaging (MRI). We assessed magnetization transfer ratio (MTR), a semi‐quantitative MRI measure sensitive to MS‐related tissue microstructural changes, in the cSC and its relationship with clinical outcomes in radiologically isolated syndrome (RIS) and MS. MTR data were acquired from 52 RIS, 201 relapsing–remitting MS (RRMS), 47 primary progressive MS (PPMS), and 43 control (CON) participants across four sites in the Canadian Prospective Cohort Study to Understand Progression in MS (CanProCo) using 3.0 T MRI systems. Mean MTR was compared between groups in whole cSC and sub‐regions between C2‐C4. Multiple linear regression was used to evaluate relationships between MTR and clinical outcomes, including the expanded disability status scale (EDSS), walking speed test (WST), and manual dexterity test (MDT). There were consistent group differences in MTR, which were most pronounced between PPMS and CON (−5.8% to −3.7%, p ≤ 0.01). In PPMS, lower MTR was associated with greater disability as measured by EDSS (β = −0.3 to −0.1, p ≤ 0.03), WST (β = −0.9 to −0.5, p ≤ 0.04), and MDT (β = −0.6 and − 0.5, p = 0.04). In RRMS, MTR was associated with only EDSS (β = −0.1, p ≤ 0.03). In this large sample of RIS and MS, cSC MTR was lowest in PPMS, with associations between MTR and clinical outcomes in MS but not RIS. These findings suggest that MTR provides important information about the underlying tissue microstructural integrity of the cSC relevant to clinical disability in established MS.

2025-06-28

Annals of Clinical and Translational Neurology (published)

doi.org

Latency-Aware Pruning and Quantization of Self-Supervised Speech Transformers for Edge Devices

Seyed Milad Ebrahimipour

Seyyed Hasan Mozafari

James J. Clark

Warren J. Gross

Brett H. Meyer

The growing adoption of self-supervised learning transformers for speech (speech SSL) is constrained by their significant computational and … (see more)memory demands, making deployment on resource-constrained edge devices challenging. We propose a latency-aware compression framework that integrates structured pruning and quantization to address these challenges. Guided by a latency model that considers the combined effects of pruning and quantization, our method dynamically identifies and removes less critical blocks while maintaining task performance, avoiding the inefficiencies of over-pruning and under-pruning seen in prior approaches. Unlike prior methods specialized in either post-training compression without fine-tuning data or in cases where fine-tuning data is available, our method is effective in both settings. Experimental results show that, in task-agnostic compression, our method achieves a 4.2 × speedup on the Hikey970 edge development platform, outperforming previous task-agnostic pruning methods in most tasks, while requiring only 21–24 GPU hours—a 3 × reduction compared to prior methods. Additionally, our method achieves a lower word error rate of 7.8% using task-specific pruning, while reducing computational overhead by approximately 19.4% in terms of GFLOPs compared to previous task-specific methods. Finally, our method consistently achieves higher accuracy than the state-of-the-art post-training compression approach across various latency speedup constraints, even without fine-tuning data.

2025-06-27

ACM Transactions on Embedded Computing Systems (published)

doi.org

Small Encoders Can Rival Large Decoders in Detecting Groundedness

Istabrak Abbes

Gabriele Prato

Quentin Fournier

Fernando Rodriguez

Alaa Boukhary

Adam Elwood

A. Chandar

Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) … (see more)tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less

2025-06-25

ArXiv (preprint)

doi.org

arxiv.org

T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs

Alireza Dizaji

Benedict Aaron Tjandra

Mehrab Hamidi

Shenyang Huang

Guillaume Rabusseau

Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite… (see more) extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks (TGNNs) effectively capture core temporal patterns such as periodicity, cause-and-effect, and long-range dependencies. In this work, we introduce the Temporal Graph Reasoning Benchmark (T-GRAB), a comprehensive set of synthetic tasks designed to systematically probe the capabilities of TGNNs to reason across time. T-GRAB provides controlled, interpretable tasks that isolate key temporal skills: counting/memorizing periodic repetitions, inferring delayed causal effects, and capturing long-range dependencies over both spatial and temporal dimensions. We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns. Our findings offer actionable insights into the limitations of current models, highlight challenges hidden by traditional real-world benchmarks, and motivate the development of architectures with stronger temporal reasoning abilities. The code for T-GRAB can be found at: https://github.com/alirezadizaji/T-GRAB.

2025-06-25

MLoG-GenAI @ ACM SIGKDD Conference on Knowledge Discovery and Data Mining (oral)

doi.org

openreview.net

Cross-Layer Discrete Concept Discovery for Interpreting Language Models

Ankur Garg

Xuemin Yu

Hassan Sajjad 0001

S Ebrahimi Kahou

Uncovering emergent concepts across transformer layers remains a significant challenge because the residual stream linearly mixes and duplic… (see more)ates information, obscuring how features evolve within large language models. Current research efforts primarily inspect neural representations at single layers, thereby overlooking this cross-layer superposition and the redundancy it introduces. These representations are typically either analyzed directly for activation patterns or passed to probing classifiers that map them to a limited set of predefined concepts. To address these limitations, we propose \gls{clvqvae}, a framework that uses vector quantization to map representations across layers and in the process collapse duplicated residual-stream features into compact, interpretable concept vectors. Our approach uniquely combines top-

2025-06-23

ArXiv (preprint)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Publications