Publications

Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization

Linfeng Du

Ye Yuan

Zichen Zhao

Fuyuan Lyu

Xiuying Chen

Jikun Kang

Xue Liu

Large Language Models (LLMs) excel at general-purpose tasks, yet adapting their responses to individual users remains challenging. Retrieval… (see more) augmentation provides a lightweight alternative to fine-tuning by conditioning LLMs on user history records, and existing approaches typically select these records based on semantic relevance. We argue that relevance serves as an unreliable proxy for utility: a record may be semantically similar to a query yet fail to improve generation quality or even degrade it due to redundancy or conflicting information. To bridge this gap, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for Llm pErsonalization. In contrast to a greedy selection of the most relevant records, PURPLE treats profile construction as a set generation process and utilizes a Plackett-Luce ranking model to capture complex inter-record dependencies. By training with dense feedback provided by the likelihood of the reference response, our method aligns retrieval directly with generation quality. Extensive experiments on nine personalization tasks demonstrate that PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines in both effectiveness and efficiency, establishing a principled and scalable solution for optimizing user profiles.

2025-12-31

arXiv (preprint)

doi.org

arxiv.org

PAC-X: Fuzzy Explainable AI for Multi-Class Malware Detection

Mohd Saqib

Benjamin C. M. Fung

Philippe Charland

2025-12-31

IEEE Trans. Fuzzy Syst. (published)

doi.org

PGT: Procedurally Generated Tasks for improving fine-grained understanding in MLLMs

Rim Assouel

Amir Bar

Michal Drozdzal

Adriana Romero-Soriano

Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. … (see more)In this work, we propose **Procedurally Generated Tasks (PGT)** a simple data-driven framework that serves a dual purpose: inducing fine-grained visual understanding and acting as a low-cost diagnostic tool to identify the source of perception failures. By overlaying unambiguous geometric primitives on images, PGT generate additional dense supervision that disentangles visual grounding capability from semantic priors. Extensive experiments on relational, quantitative, and 3D/depth understanding benchmarks show that PGT yields remarkable gains across diverse architectures. Instruction tuning MLLMs on LLaVA-v1.5-Instruct augmented with PGT data results in improvements of up to +20\% on the What’sUp benchmark and +13.3\% on CV-Bench-2D, while maintaining general perception capabilities. Moreover, finetuning state-of-the-art MLLMs on PGT data leads to boosts of up to +5.5\% on What’sUp and +8.3\% on CV-Bench-2D. These findings demonstrate that PGT effectively address the bottleneck of fine-grained perception, revealing that many spatial reasoning deficits stem from inadequate supervision signals rather than inherent architectural or resolution limitations.

2025-12-31

International Conference on Machine Learning (Accept (regular))

openreview.net

PheCode-guided multi-modal topic modeling of electronic health records improves disease incidence prediction and GWAS discovery from UK Biobank

Ziqi Yang

Ziyang Song

Shadi Zabad

Marc-André Legault

Yue Li

Phenome-wide association studies rely on disease definitions derived from diagnostic codes, often failing to leverage the full richness of e… (see more)lectronic health records (EHR). We present MixEHR-SAGE, a PheCode-guided multi-modal topic model that integrates diagnoses, procedures, and medications to enhance phenotyping from large-scale EHRs. By combining expert-informed priors with probabilistic inference, MixEHR-SAGE identifies over 1000 interpretable phenotype topics from UK Biobank data. Applied to 350 000 individuals with high-quality genetic data, MixEHR-SAGE-derived risk scores accurately predict incident type 2 diabetes (T2D) and leukemia diagnoses. Subsequent genome-wide association studies using these continuous risk scores uncovered novel disease-associated loci, including PPP1R15A for T2D and JMJD6/SRSF2 for leukemia, that were missed by traditional binary case definitions. These results highlight the potential of probabilistic phenotyping from multi-modal EHRs to improve genetic discovery. The MixEHR-SAGE software is publicly available at: https://github.com/li-lab-mcgill/MixEHR-SAGE.

2025-12-31

Briefings in Bioinformatics (published)

doi.org

Position: Agentic AI Systems should be making Bayes-Consistent Decisions

Theodore Papamarkou

Pierre Alquier

Matthias Bauer

Wray Buntine

A. Davison

Gintare Karolina Dziugaite

Maurizio Filippone

Andrew Y. K. Foong

Vincent Fortuin

Dimitris Fouskakis

Eyke Hüllermeier

Theofanis Karaletsos

Mohammad Emtiyaz Khan

Nikita Kotelevskii

S. Lahlou

Yingzhen Li

F Liu

Clare Lyle

Thomas Möllenhoff

Konstantina Palla … (see 9 more)

Maxim Panov

Yusuf Sale

Kajetan Schweighofer

Artem Shelmanov

Siddharth Swaroop

Martin Trapp

Willem Waegeman

Andrew Gordon Wilson

Alexey Zaytsev

2025-12-31

SSRN Electronic Journal (accepted)

doi.org

Position: Causality is Key for Interpretability Claims to Generalise

Shruti Joshi

Aaron Mueller

David Klindt

Wieland Brendel

Dhanya Sridhar

Patrik Reizinger

Interpretability research on large language models (LLMs) has produced methods that align model components to high-level concepts, yet their… (see more) use has been accompanied by recurring failures: findings that do not generalise, and causal language that outruns the evidence. Our position is that Pearl’s causal hierarchy formally defines what constitutes a good alignment, what data or assumptions it requires, and what inferences it supports. Specifically, observations of model behaviour support only associational claims; interventions enable cause-effect claims, but not necessarily predictions of model behaviour; counterfactuals, or predictions of behaviour on unseen examples, are often unverifiable in current studies. We show how interpretability research can benefit from causal representation learning (CRL), which provides tools for provably extracting semantic variables and their relationships from activations, and outline practical requirements for generalisable insights: robustness to distribution shifts, sensitivity to assumptions, and compositionality of interventions. Our diagnostic framework helps practitioners select appropriate methods and mitigate failures to ensure that claims match evidence and findings generalise.

2025-12-31

International Conference on Machine Learning (Accept (regular))

openreview.net

Position: Collusion Risks Among AI Reasoning Agents Justify Certification Requirements for Making Market Decisions

Matthew Riemer

Tommaso Tosato

Maximilian Puelma Touzel

This position paper argues that AI agents with chain-of-thought reasoning capabilities are predisposed to exhibit collusive behavior and sho… (see more)uld be required to obtain behavioral certification before making decisions that affect economic markets. This is because integrating these agents into society could collapse the legal evidentiary distinction between competition and collusion among independent firms without eroding the economic harm distinction. Experiments with DeepSeek-R1 agents in the Bertrand oligopoly pricing domain reveal a tendency towards tacit collusion that persists even when humans prompt the agents not to collude. We further show that the chain-of- thought of these agents can be steered toward either extremely collusive or highly competitive behavior in a way that is not semantically detectable by another LLM analyzing the reasoning traces. As a result, deploying reasoning agents for market decisions leads to collusive economic outcomes without any evidence of conspiracy or intent. Thus, certification based on observed behavior in representative situations is necessary to prevent collusion. We provide preliminary evidence that such agents can be steered in a generalizable way toward efficient competitive equilibria. However, developing a comprehensive behavioral certification will be required before these models can be deployed in real-world markets while ensuring their stability and efficiency.

2025-12-31

International Conference on Machine Learning (Accept (regular))

openreview.net

Position: Irresponsible AI: big tech’s influence on AI research and associated impacts

Alex Hernández-García

Alexandra Volokhova

Ezekiel Williams

Dounia Kabakibo

Mélisande Teng

The accelerated development, deployment and adoption of artificial intelligence systems has been fuelled by the increasing presence of big t… (see more)ech in the AI field. This trend has been accompanied by growing ethical concerns and intensified societal and environmental impacts. This position paper argues that irresponsible AI development is strongly driven by big tech's influence and involvement in the field. We develop this argument by laying out the factors through which this influence leads to irresponsible AI. First, we examine the growing and disproportionate influence of big tech in AI research and argue that its drive for scaling and general-purpose systems is fundamentally at odds with the responsible, ethical, and sustainable development of AI. Second, we review key current environmental and societal negative impacts of AI and trace their connections to big tech's influence. Third, we discuss the underlying economic forces driving big tech's actions. Finally, as a call to action, we highlight the need for AI researchers to counter big tech's influence, and review and propose strategies that build on the responsibility of implicated actors and collective action.

2025-12-31

International Conference on Machine Learning (Accept (spotlight))

openreview.net

Position: LLM-Safety Evaluations Lack Robustness

Tim Beyer

Sophie Xhonneux

Simon Geisler

Gauthier Gidel

Leo Schwinn

Stephan Günnemann

In this position paper, we argue that current safety alignment research efforts for large language models are hindered by many intertwined s… (see more)ources of noise, such as small datasets, methodological inconsistencies, and unreliable evaluation setups. This can, at times, make it impossible to evaluate and compare attacks and defenses fairly, thereby slowing research progress. We systematically analyze the LLM safety evaluation pipeline, covering dataset curation, optimization strategies for automated red-teaming, response generation, and response evaluation using LLM judges. At each stage, we identify key issues and highlight their practical impact. We also propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers. Lastly, we offer an opposing perspective, highlighting practical reasons for existing limitations. We believe that addressing the outlined problems in future research will improve the field’s ability to generate easily comparable results and make measurable progress.

2025-12-31

International Conference on Machine Learning (Accept (regular))

openreview.net

Position: Modular Memory is the Key to Continual Learning Agents

Vaggelis Dorovatas

Malte Schwerin

Andrew Bagdanov

Lucas Caccia

Antonio Carta

Laurent Charlin

CITEC Barbara Hammer

Tyler Hayes

Timm Hess

Christopher Kanan

Dhireesha Kudithipudi

Xialei Liu

Vincenzo Lomonaco

Jorge Mendez-Mendez

Darshan Patil

Ameya Pandurang Prabhu

Elisa Ricci

Tinne Tuytelaars

Gido van de Ven

Liyuan Wang … (see 4 more)

Joost van de Weijer

Jonghyun Choi

Martin Mundt

Rahaf Aljundi

Foundation models have transformed machine learning through large-scale pretraining, massive parameterization, and increased test-time compu… (see more)te. Despite surpassing human performance in several domains, these models remain fundamentally limited in continuous operation, experience accumulation, and personalization, capabilities that are central to adaptive intelligence. While continual learning research has long targeted these goals, its historical focus on in-weight learning, i.e., updating a single model’s parameters to absorb new knowledge, has rendered catastrophic forgetting a persistent challenge. **Our position is that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale.** We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities, thereby mitigating catastrophic forgetting and charting a practical roadmap toward continually learning agents.

2025-12-31

International Conference on Machine Learning (Accept (spotlight))

openreview.net

Position: Time to Close The Validation Gap in LLM Social Simulations

Maximilian Puelma Touzel

Sneheel Sarangi

Aurélien Bück-Kaeffer

Zachary Yang

Jean-François Godbout

Reihaneh Rabbany

LLM-based social simulations—in which many language model agents interact over multiple turns—are rapidly proliferating across policy an… (see more)alysis, epidemiology, and computational social science. Yet the field lacks consensus on how to validate these simulations, with evaluation methods that are sparse, inconsistent, and rarely shared across disciplinary silos. We argue this creates a serious risk: premature deployment of unvalidated simulators in high-stakes domains. Our position is that the field must pivot from expansion to consolidation, prioritizing methodological standardization—shared benchmarks, open data, and reproducible evaluation protocols grounded in social science and complex systems research. We outline a concrete research program organized around specific learning problems/benchmarks, providing a path toward answering the fundamental question: when are LLM social simulations useful modelling objects?

2025-12-31

International Conference on Machine Learning (Accept (regular))

openreview.net

Practical Solutions to Volt-var Optimization under Uncertainty via Blackbox Optimization

Feng Li

Ilhan Kocar

Antoine Lesage-Landry

In this work, we propose an optimal reactive power dispatch (ORPD) stochastic program for volt-var optimization (VVO) of power distribution … (see more)networks. The formulation considers not only control settings of conventional VVO devices, e.g., voltage regulators, capacitor banks, and on-load tap changers, but also optimal settings for volt-var droop curves of distributed energy resources (DERs), compliant with the IEEE 1547-2018 standard. Instead of including the power flow equations in the optimization problem which makes it nonlinear and nonconvex, a power flow solver is utilized and the problem is solved by blackbox optimization (BBO). The feasibility of the derived solution is improved by using unbalanced power flow simulations. The solution is effective under various demand and DER generation scenarios such that device settings are not frequently changed, making it practical for in-field implementations. Through numerical simulations on IEEE test feeders, we illustrate the performance of the solutions of our proposed approach on both in-sample and out-of-sample scenarios. We show that our approach outperforms a benchmark reinforcement learning method, and is also scalable to large-scale distribution networks.

2025-12-31

IEEE Transactions on Power Delivery (published)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications