Publications

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning
Andrew Holliday
Ahmed El-Geneidy
Neural Incremental Dynamic Inversion Control of a Multirotor Robotic Airship
Ely Carneiro de Paiva
José Raul Azinheira
Rafael de Angelis Cordeiro
José Reginaldo H. Carvalho
Apolo Marton
Tracking the Evolving Role of Artificial Intelligence in Implementation Science: Protocol for a Living Scoping Review of Applications, Evaluation Approaches and Outcomes
Guillaume Fontaine
Olivia Di Lalla
Susan Michie
Byron J. Powell
Vivian Welch
James Thomas
Jeffery Chan
France Légaré
Janna Hastings
Sylvie D. Lambert
Justin Presseau
Sharon E. Straus
Ian D. Graham
Ruopeng An
Daniel N. Elakpa
Meagan Mooney
Alenda Dwiadila Matra Putra
Rachael Laritz
Natalie Taylor
Background Artificial intelligence (AI) offers significant opportunities to improve the field of implementation science by supporting… (voir plus) key activities such as evidence synthesis, contextual analysis, and decision-making to promote the adoption and sustainability of evidence-based practices. This living scoping review aims to: (1) map applications of AI in implementation research and practice; (2) identify evaluation approaches, reported outcomes, and potential risks; and (3) synthesize reported research gaps and opportunities for advancing the use of AI in implementation science. Methods This scoping review will follow the Joanna Briggs Institute (JBI) methodology and the Cochrane guidance for living systematic reviews. A living scoping review is warranted to keep up with the rapid changes in AI and its growing use in implementation science. We will include empirical studies, systematic reviews, grey literature, and policy documents that describe or evaluate applications of AI to support implementation science across the steps of the Knowledge-to-Action (KTA) Model. AI methods and models of interest include machine learning, deep learning, natural language processing, large language models, and related technologies and approaches. A search strategy will be applied to bibliographic databases (MEDLINE, Embase, CINAHL, PsycINFO, IEEE Xplore, Web of Science), relevant journals, conference proceedings, and preprint servers. Two reviewers will independently screen studies and extract data on AI characteristics, specific implementation task according to the KTA Model, evaluation methods, outcome domains, risks, and research gaps. Extracted data will be analyzed descriptively and synthesized narratively using a mapping approach aligned with the KTA Model. Discussion This living review will consolidate the evidence base on how AI is applied across the spectrum of implementation science. It will inform researchers, policymakers, and practitioners seeking to harness AI to improve the adoption, scale-up, and sustainability of evidence-based interventions, while identifying areas for methodological advancement and risk mitigation. Review registration Open Science Framework, May 2025: https://doi.org/10.17605/OSF.IO/2Q5DV
AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection.
Arezo Bodaghi
Benjamin C. M. Fung
Ketra A. Schmitt
Addressing the challenge of toxic language in online discussions is crucial for the development of effective toxicity detection models. This… (voir plus) pioneering work focuses on addressing imbalanced datasets in toxicity detection by introducing a novel approach to augment toxic language data. We create a balanced dataset by instructing fine-tuning of Large Language Models (LLMs) using Reinforcement Learning with Human Feedback (RLHF). Recognizing the challenges in collecting sufficient toxic samples from social media platforms for building a balanced dataset, our methodology involves sentence-level text data augmentation through paraphrasing existing samples using optimized generative LLMs. Leveraging generative LLM, we utilize the Proximal Policy Optimizer (PPO) as the RL algorithm to fine-tune the model further and align it with human feedback. In other words, we start by fine-tuning a LLM using an instruction dataset, specifically tailored for the task of paraphrasing while maintaining semantic consistency. Next, we apply PPO and a reward function, to further fine-tune (optimize) the instruction-tuned LLM. This RL process guides the model in generating toxic responses. We utilize the Google Perspective API as a toxicity evaluator to assess generated responses and assign rewards/penalties accordingly. This approach guides LLMs through PPO and the reward function, transforming minority class samples into augmented versions. The primary goal of our methodology is to create a balanced and diverse dataset to enhance the accuracy and performance of classifiers in identifying instances from the minority class. Utilizing two publicly available toxic datasets, we compared various techniques with our proposed method for generating toxic samples, demonstrating that our approach outperforms all others in producing a higher number of toxic samples. Starting with an initial 16,225 toxic prompts, our method successfully generated 122,951 toxic samples with a toxicity score exceeding 30%. Subsequently, we developed various classifiers using the generated balanced datasets and applied a cost-sensitive learning approach to the original imbalanced dataset. The findings highlight the superior performance of classifiers trained on data generated using our proposed method. These results highlight the importance of employing RL and a data-agnostic model as a reward mechanism for augmenting toxic data, thereby enhancing the robustness of toxicity detection models.
Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
'Ohhh, he's the boss!': Unpacking Power Dynamics Among Developers, Designers, and End-Users in FLOSS Usability
Jazlyn Hellman
Itai Epstein
Jinghui Cheng
Jin L.C. Guo
Addressing usability in free, libre, and open-source software (FLOSS) is a challenging issue, particularly due to a long-existing ''by devel… (voir plus)oper, for developer'' mentality. Engaging designers and end-users to work with developers can help improve its usability, but unequal power dynamics among those stakeholder roles must be mitigated. To explore how the power of different FLOSS stakeholders manifests and can be mediated during collaboration, we conducted eight design workshops with different combinations of key FLOSS stakeholders (i.e., developers, designers, and end-users). Leveraging existing theories on Dimensions of Power, we revealed how participants navigate existing role-based power structures through resource utilization, knowledge gap management, and experience referencing. We also observed that participants exhibited diverse behaviors confirming and challenging the status quo of FLOSS usability. Overall, our results contribute to a comprehensive understanding of the power dynamics among FLOSS stakeholders, providing valuable insights into ways to balance their power to improve FLOSS usability. Our work also serves as an exemplar of using design workshops as a research method to study power dynamics during collaboration that are usually hidden in the field.
Predicting the Subhalo Mass Functions in Simulations from Galaxy Images
Tri Nguyen
J. Rose
Chris Lovell
Francisco Villaescusa-navarro
It Takes Two: Your GRPO Is Secretly DPO
Yihong Wu
Lei Ding
Muzhi Li
Xinyu Wang
Kejia Chen
Zhanguang Zhang
Chenyang Huang
Yingxue Zhang
Mark J. Coates
Jian-Yun Nie
Group Relative Policy Optimization (GRPO) is a prominent reinforcement learning algorithm for post-training Large Language Models (LLMs). I… (voir plus)t is commonly believed that GRPO necessitates a large group size to ensure stable training via precise statistical estimation, which incurs substantial computational overhead. In this work, we challenge this assumption by reframing GRPO as a form of contrastive learning, which reveals a fundamental connection to Direct Preference Optimization (DPO). Motivated by DPO's empirical success, we investigate the minimal two-rollout case (2-GRPO)—a configuration previously deemed infeasible. We provide a rigorous theoretical analysis to validate 2-GRPO and demonstrate empirically that it achieves performance on par with 16-GRPO, despite using only
A Comprehensive Review of Transmission and Distribution Optimal Power Flow Problems for the Integration of Distributed Energy Resources
Samuel M. Muhindo
Hussein Suprême
This paper presents a comprehensive review of coordination methods for addressing large-scale transmission and distribution optimal power fl… (voir plus)ow (TDOPF) problems involving distributed energy resources. With distinct objectives, each transmission and distribution system operator (TSO/DSO) independently seeks to solve its own optimal power flow (OPF) instance. First, iterative methods are reviewed, in which the central OPF is solved recursively by decomposing the full problem into smaller, more manageable sub-problems or by replacing peripheral portions of the network within the central OPF with reduced equivalent grids. Generally, the convergence to an optimal solution of the full problem when all sub-OPFs are coordinated is not guaranteed as iterative methods repeat procedures until the changes in control variables of the central OPF are minimal. Second, sequential methods are reviewed, in which the central OPF is solved sequentially in a fixed, nonrepeating procedure by considering previous results. Achieving a fair balance between TSO and DSO interests in sequential methods might adversely affect the performance of a largescale central OPF. The advantages and the limitations of the two coordination methods are presented based on the operation mode of TSO-DSO network. Future research opportunities for coordination methods of TSO-DSO network are drawn using the Kron reduction method and mean-field games.
Co-Producing AI: Toward an Augmented, Participatory Lifecycle
Toumadher Ammar
Cassandre Chatonnier
Shin Koseki
Despite efforts to mitigate the inherent risks and biases of artificial intelligence (AI) algorithms, these algorithms can disproportionatel… (voir plus)y impact culturally marginalized groups. A range of approaches has been proposed to address or reduce these risks, including the development of ethical guidelines and principles for responsible AI, as well as technical solutions that promote algorithmic fairness. Drawing on design justice, expansive learning theory, and recent empirical work on participatory AI, we argue that mitigating these harms requires a fundamental re-architecture of the AI production pipeline. This re-design should center co-production, diversity, equity, inclusion (DEI), and multidisciplinary collaboration. We introduce an augmented AI lifecycle consisting of five interconnected phases: co-framing, co-design, co-implementation, co-deployment, and co-maintenance. The lifecycle is informed by four multidisciplinary workshops and grounded in themes of distributed authority and iterative knowledge exchange. Finally, we relate the proposed lifecycle to several leading ethical frameworks and outline key research questions that remain for scaling participatory governance.
Fairness in Federated Learning: Fairness for Whom?
Fairness in federated learning has emerged as a rapidly growing area of research, with numerous works proposing formal definitions and algor… (voir plus)ithmic interventions. Yet, despite this technical progress, fairness in FL is often defined and evaluated in ways that abstract away from the sociotechnical contexts in which these systems are deployed. In this paper, we argue that existing approaches tend to optimize narrow system level metrics, such as performance parity or contribution-based rewards, while overlooking how harms arise throughout the FL lifecycle and how they impact diverse stakeholders. We support this claim through a critical analysis of the literature, based on a systematic annotation of papers for their fairness definitions, design decisions, evaluation practices, and motivating use cases. Our analysis reveals five recurring pitfalls: 1) fairness framed solely through the lens of server client architecture, 2) a mismatch between simulations and motivating use-cases and contexts, 3) definitions that conflate protecting the system with protecting its users, 4) interventions that target isolated stages of the lifecycle while neglecting upstream and downstream effects, 5) and a lack of multi-stakeholder alignment where multiple fairness definitions can be relevant at once. Building on these insights, we propose a harm centered framework that links fairness definitions to concrete risks and stakeholder vulnerabilities. We conclude with recommendations for more holistic, context-aware, and accountable fairness research in FL.
From Efficiency to Equity: Measuring Fairness in Preference Learning
S. Gowaikar
Rashid A. Mushkani
Shin Koseki
As AI systems, particularly generative models, increasingly influence decision-making, ensuring that they are able to fairly represent diver… (voir plus)se human preferences becomes crucial. This paper introduces a novel framework for evaluating epistemic fairness in preference learning models inspired by economic theories of inequality and Rawlsian justice. We propose metrics adapted from the Gini Coefficient, Atkinson Index, and Kuznets Ratio to quantify fairness in these models. We validate our approach using two datasets: a custom visual preference dataset (AI-EDI-Space) and the Jester Jokes dataset. Our analysis reveals variations in model performance across users, highlighting potential epistemic injustices. We explore pre-processing and in-processing techniques to mitigate these inequalities, demonstrating a complex relationship between model efficiency and fairness. This work contributes to AI ethics by providing a framework for evaluating and improving epistemic fairness in preference learning models, offering insights for developing more inclusive AI systems in contexts where diverse human preferences are crucial.