Publications

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

Andrew Holliday

Ahmed El-Geneidy

Gregory Dudek

2025-10-16

Transportmetrica B: Transport Dynamics (published)

doi.org

arxiv.org

Neural Incremental Dynamic Inversion Control of a Multirotor Robotic Airship

Ely Carneiro de Paiva

José Raul Azinheira

Rafael de Angelis Cordeiro

José Reginaldo H. Carvalho

Apolo Marton

Giovanni Beltrame

2025-10-16

International Journal of Intelligent Systems (published)

doi.org

Tracking the Evolving Role of Artificial Intelligence in Implementation Science: Protocol for a Living Scoping Review of Applications, Evaluation Approaches and Outcomes

Guillaume Fontaine

Olivia Di Lalla

Susan Michie

Byron J. Powell

Vivian Welch

James Thomas

Jeffery Chan

Samira Abbasgholizadeh-Rahimi

France Légaré

Janna Hastings

Sylvie D. Lambert

Justin Presseau

Sharon E. Straus

Ian D. Graham

Ruopeng An

Daniel N. Elakpa

Meagan Mooney

Alenda Dwiadila Matra Putra

Rachael Laritz

Natalie Taylor

Background Artificial intelligence (AI) offers significant opportunities to improve the field of implementation science by supporting… (see more) key activities such as evidence synthesis, contextual analysis, and decision-making to promote the adoption and sustainability of evidence-based practices. This living scoping review aims to: (1) map applications of AI in implementation research and practice; (2) identify evaluation approaches, reported outcomes, and potential risks; and (3) synthesize reported research gaps and opportunities for advancing the use of AI in implementation science. Methods This scoping review will follow the Joanna Briggs Institute (JBI) methodology and the Cochrane guidance for living systematic reviews. A living scoping review is warranted to keep up with the rapid changes in AI and its growing use in implementation science. We will include empirical studies, systematic reviews, grey literature, and policy documents that describe or evaluate applications of AI to support implementation science across the steps of the Knowledge-to-Action (KTA) Model. AI methods and models of interest include machine learning, deep learning, natural language processing, large language models, and related technologies and approaches. A search strategy will be applied to bibliographic databases (MEDLINE, Embase, CINAHL, PsycINFO, IEEE Xplore, Web of Science), relevant journals, conference proceedings, and preprint servers. Two reviewers will independently screen studies and extract data on AI characteristics, specific implementation task according to the KTA Model, evaluation methods, outcome domains, risks, and research gaps. Extracted data will be analyzed descriptively and synthesized narratively using a mapping approach aligned with the KTA Model. Discussion This living review will consolidate the evidence base on how AI is applied across the spectrum of implementation science. It will inform researchers, policymakers, and practitioners seeking to harness AI to improve the adoption, scale-up, and sustainability of evidence-based interventions, while identifying areas for methodological advancement and risk mitigation. Review registration Open Science Framework, May 2025: https://doi.org/10.17605/OSF.IO/2Q5DV

2025-10-16

F1000Research (published)

doi.org

AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection.

Arezo Bodaghi

Benjamin C. M. Fung

Ketra A. Schmitt

Addressing the challenge of toxic language in online discussions is crucial for the development of effective toxicity detection models. This… (see more) pioneering work focuses on addressing imbalanced datasets in toxicity detection by introducing a novel approach to augment toxic language data. We create a balanced dataset by instructing fine-tuning of Large Language Models (LLMs) using Reinforcement Learning with Human Feedback (RLHF). Recognizing the challenges in collecting sufficient toxic samples from social media platforms for building a balanced dataset, our methodology involves sentence-level text data augmentation through paraphrasing existing samples using optimized generative LLMs. Leveraging generative LLM, we utilize the Proximal Policy Optimizer (PPO) as the RL algorithm to fine-tune the model further and align it with human feedback. In other words, we start by fine-tuning a LLM using an instruction dataset, specifically tailored for the task of paraphrasing while maintaining semantic consistency. Next, we apply PPO and a reward function, to further fine-tune (optimize) the instruction-tuned LLM. This RL process guides the model in generating toxic responses. We utilize the Google Perspective API as a toxicity evaluator to assess generated responses and assign rewards/penalties accordingly. This approach guides LLMs through PPO and the reward function, transforming minority class samples into augmented versions. The primary goal of our methodology is to create a balanced and diverse dataset to enhance the accuracy and performance of classifiers in identifying instances from the minority class. Utilizing two publicly available toxic datasets, we compared various techniques with our proposed method for generating toxic samples, demonstrating that our approach outperforms all others in producing a higher number of toxic samples. Starting with an initial 16,225 toxic prompts, our method successfully generated 122,951 toxic samples with a toxicity score exceeding 30%. Subsequently, we developed various classifiers using the generated balanced datasets and applied a cost-sensitive learning approach to the original imbalanced dataset. The findings highlight the superior performance of classifiers trained on data generated using our proposed method. These results highlight the importance of employing RL and a data-agnostic model as a reward mechanism for augmenting toxic data, thereby enhancing the robustness of toxicity detection models.

2025-10-15

ACM Transactions on the Web (published)

doi.org

Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts

Maxime Heuillet

Yufei Cui

Boxing Chen

Audrey Durand

Prasanna Parthasarathi

2025-10-15

NeurIPS.cc/2025/Workshop/ER (accepted)

doi.org

openreview.net

'Ohhh, he's the boss!': Unpacking Power Dynamics Among Developers, Designers, and End-Users in FLOSS Usability

Jazlyn Hellman

Itai Epstein

Jinghui Cheng

Jin L.C. Guo

Addressing usability in free, libre, and open-source software (FLOSS) is a challenging issue, particularly due to a long-existing ''by devel… (see more)oper, for developer'' mentality. Engaging designers and end-users to work with developers can help improve its usability, but unequal power dynamics among those stakeholder roles must be mitigated. To explore how the power of different FLOSS stakeholders manifests and can be mediated during collaboration, we conducted eight design workshops with different combinations of key FLOSS stakeholders (i.e., developers, designers, and end-users). Leveraging existing theories on Dimensions of Power, we revealed how participants navigate existing role-based power structures through resource utilization, knowledge gap management, and experience referencing. We also observed that participants exhibited diverse behaviors confirming and challenging the status quo of FLOSS usability. Overall, our results contribute to a comprehensive understanding of the power dynamics among FLOSS stakeholders, providing valuable insights into ways to balance their power to improve FLOSS usability. Our work also serves as an exemplar of using design workshops as a research method to study power dynamics during collaboration that are usually hidden in the field.

2025-10-15

Proceedings of the ACM on Human-Computer Interaction (published)

doi.org

arxiv.org

Predicting the Subhalo Mass Functions in Simulations from Galaxy Images

Andreas Filipp

Tri Nguyen

Laurence Perreault-Levasseur

J. Rose

Chris Lovell

Nicolas Payot

Francisco Villaescusa-navarro

Yashar Hezaveh

2025-10-15

ArXiv (preprint)

arxiv.org

It Takes Two: Your GRPO Is Secretly DPO

Yihong Wu

Liheng Ma

Lei Ding

Muzhi Li

Xinyu Wang

Kejia Chen

Zhan Su

Zhanguang Zhang

Chenyang Huang

Yingxue Zhang

Mark J. Coates

Jian-Yun Nie

Group Relative Policy Optimization (GRPO) is a prominent reinforcement learning algorithm for post-training Large Language Models (LLMs). I… (see more)t is commonly believed that GRPO necessitates a large group size to ensure stable training via precise statistical estimation, which incurs substantial computational overhead. In this work, we challenge this assumption by reframing GRPO as a form of contrastive learning, which reveals a fundamental connection to Direct Preference Optimization (DPO). Motivated by DPO's empirical success, we investigate the minimal two-rollout case (2-GRPO)—a configuration previously deemed infeasible. We provide a rigorous theoretical analysis to validate 2-GRPO and demonstrate empirically that it achieves performance on par with 16-GRPO, despite using only

2025-10-15

NeurIPS.cc/2025/Workshop/ER (spotlight)

openreview.net

A Comprehensive Review of Transmission and Distribution Optimal Power Flow Problems for the Integration of Distributed Energy Resources

Samuel M. Muhindo

Hanane Dagdougui

Antoine Lesage-Landry

Hussein Suprême

This paper presents a comprehensive review of coordination methods for addressing large-scale transmission and distribution optimal power fl… (see more)ow (TDOPF) problems involving distributed energy resources. With distinct objectives, each transmission and distribution system operator (TSO/DSO) independently seeks to solve its own optimal power flow (OPF) instance. First, iterative methods are reviewed, in which the central OPF is solved recursively by decomposing the full problem into smaller, more manageable sub-problems or by replacing peripheral portions of the network within the central OPF with reduced equivalent grids. Generally, the convergence to an optimal solution of the full problem when all sub-OPFs are coordinated is not guaranteed as iterative methods repeat procedures until the changes in control variables of the central OPF are minimal. Second, sequential methods are reviewed, in which the central OPF is solved sequentially in a fixed, nonrepeating procedure by considering previous results. Achieving a fair balance between TSO and DSO interests in sequential methods might adversely affect the performance of a largescale central OPF. The advantages and the limitations of the two coordination methods are presented based on the operation mode of TSO-DSO network. Future research opportunities for coordination methods of TSO-DSO network are drawn using the Kron reduction method and mean-field games.

2025-10-14

2025 IEEE Electrical Power and Energy Conference (EPEC) (published)

doi.org

Co-Producing AI: Toward an Augmented, Participatory Lifecycle

Rashid Mushkani

Hugo Berard

Toumadher Ammar

Cassandre Chatonnier

Shin Koseki

Despite efforts to mitigate the inherent risks and biases of artificial intelligence (AI) algorithms, these algorithms can disproportionatel… (see more)y impact culturally marginalized groups. A range of approaches has been proposed to address or reduce these risks, including the development of ethical guidelines and principles for responsible AI, as well as technical solutions that promote algorithmic fairness. Drawing on design justice, expansive learning theory, and recent empirical work on participatory AI, we argue that mitigating these harms requires a fundamental re-architecture of the AI production pipeline. This re-design should center co-production, diversity, equity, inclusion (DEI), and multidisciplinary collaboration. We introduce an augmented AI lifecycle consisting of five interconnected phases: co-framing, co-design, co-implementation, co-deployment, and co-maintenance. The lifecycle is informed by four multidisciplinary workshops and grounded in themes of distributed authority and iterative knowledge exchange. Finally, we relate the proposed lifecycle to several leading ethical frameworks and outline key research questions that remain for scaling participatory governance.

2025-10-14

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (published)

doi.org

arxiv.org

Fairness in Federated Learning: Fairness for Whom?

Afaf Taïk

Khaoula Chehbouni

Golnoosh Farnadi

Fairness in federated learning has emerged as a rapidly growing area of research, with numerous works proposing formal definitions and algor… (see more)ithmic interventions. Yet, despite this technical progress, fairness in FL is often defined and evaluated in ways that abstract away from the sociotechnical contexts in which these systems are deployed. In this paper, we argue that existing approaches tend to optimize narrow system level metrics, such as performance parity or contribution-based rewards, while overlooking how harms arise throughout the FL lifecycle and how they impact diverse stakeholders. We support this claim through a critical analysis of the literature, based on a systematic annotation of papers for their fairness definitions, design decisions, evaluation practices, and motivating use cases. Our analysis reveals five recurring pitfalls: 1) fairness framed solely through the lens of server client architecture, 2) a mismatch between simulations and motivating use-cases and contexts, 3) definitions that conflate protecting the system with protecting its users, 4) interventions that target isolated stages of the lifecycle while neglecting upstream and downstream effects, 5) and a lack of multi-stakeholder alignment where multiple fairness definitions can be relevant at once. Building on these insights, we propose a harm centered framework that links fairness definitions to concrete risks and stakeholder vulnerabilities. We conclude with recommendations for more holistic, context-aware, and accountable fairness research in FL.

2025-10-14

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (published)

doi.org

arxiv.org

From Efficiency to Equity: Measuring Fairness in Preference Learning

S. Gowaikar

Hugo Berard

Rashid A. Mushkani

Shin Koseki

As AI systems, particularly generative models, increasingly influence decision-making, ensuring that they are able to fairly represent diver… (see more)se human preferences becomes crucial. This paper introduces a novel framework for evaluating epistemic fairness in preference learning models inspired by economic theories of inequality and Rawlsian justice. We propose metrics adapted from the Gini Coefficient, Atkinson Index, and Kuznets Ratio to quantify fairness in these models. We validate our approach using two datasets: a custom visual preference dataset (AI-EDI-Space) and the Jester Jokes dataset. Our analysis reveals variations in model performance across users, highlighting potential epistemic injustices. We explore pre-processing and in-processing techniques to mitigate these inequalities, demonstrating a complex relationship between model efficiency and fairness. This work contributes to AI ethics by providing a framework for evaluating and improving epistemic fairness in preference learning models, offering insights for developing more inclusive AI systems in contexts where diverse human preferences are crucial.

2025-10-14

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (published)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications