Publications

Correction: Al content detection in the emerging information ecosystem: new obligations for media and tech companies
Alistair Knott
Dino Pedreschi
Toshiya Jitsuzumi
Susan Leavy
David Eyers
Tapabrata Chakraborti
Andrew Trotman
Sundar Sundareswaran
Ricardo Baeza-Yates
Przemyslaw Biecek
Adrian Weller
Paul D. Teal
Subhadip Basu
Mehmet Haklidir
Virginia Morini
Stuart Russell
FairLoRA: Unpacking Bias Mitigation in Vision Models with Fairness-Driven Low-Rank Adaptation
Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think
Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing digital … (voir plus)tasks through web interfaces in a human-like manner. However, even the strongest closed-source models often struggle to achieve robust results on several benchmarks, while a notable performance gap exists between them and open-source counterparts. This study investigates the potential of fine-tuning to enhance the performance of a smaller, lower-performing but cost-efficient LLM by leveraging successful traces from stronger LLMs, referred to as experts. We outline a comprehensive pipeline for data collection, filtering, and supervised fine-tuning and explore various behavior cloning parameters. Our experiments provide key insights into the challenges of fine-tuning LLMs into web agents on benchmarks like MiniWoB and WorkArena. Notably, we find that the fine-tuned agents' ability to predict expert trajectories does not consistently lead to improved downstream task performance. This raises issues such as off-policy bias and the loss of reasoning abilities during fine-tuning. We discuss potential solutions to these challenges and make both the codebase and a dataset of 140M tokens open-source for the community to build upon.
Graph Knowledge Distillation to Mixture of Experts
Pavel Rumiantsev
Mark J. Coates
Health satisfaction outcome from integrated autonomous mobile clinics
Yuzhang Huang
Shaoshan Liu
Zhongying Pan
Carl Wu
Herng-Chia Chiu
Xue Liu
Leiyu Shi
GFlowNets for Hamiltonian decomposition in groups of compatible operators
Isaac L. Huidobro-Meezs
R. A. Vargas-Hern'andez
Quantum computing presents a promising alternative for the direct simulation of quantum systems with the potential to explore chemical probl… (voir plus)ems beyond the capabilities of classical methods. However, current quantum algorithms are constrained by hardware limitations and the increased number of measurements required to achieve chemical accuracy. To address the measurement challenge, techniques for grouping commuting and anti-commuting terms, driven by heuristics, have been developed to reduce the number of measurements needed in quantum algorithms on near-term quantum devices. In this work, we propose a probabilistic framework using GFlowNets to group fully (FC) or qubit-wise commuting (QWC) terms within a given Hamiltonian. The significance of this approach is demonstrated by the reduced number of measurements for the found groupings; 51% and 67% reduction factors respectively for FC and QWC partitionings with respect to greedy coloring algorithms, highlighting the potential of GFlowNets for future applications in the measurement problem. Furthermore, the flexibility of our algorithm extends its applicability to other resource optimization problems in Hamiltonian simulation, such as circuit design.
Generating Tabular Data Using Heterogeneous Sequential Feature Forest Flow Matching
Circulating IL-17F, but not IL-17A, is elevated in severe COVID-19 and leads to an ERK1/2 and p38 MAPK-dependent increase in ICAM-1 cell surface expression and neutrophil adhesion on endothelial cells
Jérôme Bédard-Matteau
Katelyn Yixiu Liu
Lyvia Fourcade
Douglas D. Fraser
Simon Rousseau
Severe COVID-19 is associated with neutrophilic inflammation and immunothrombosis. Several members of the IL-17 cytokine family have been as… (voir plus)sociated with neutrophilic inflammation and activation of the endothelium. Therefore, we investigated whether these cytokines were associated with COVID-19. We investigated the association between COVID-19 and circulating plasma levels of IL-17 cytokine family members in participants to the Biobanque québécoise de la COVID-19 (BQC19), a prospective observational cohort and an independent cohort from Western University (London, Ontario). We measured the in vitro impact of IL-17F on intercellular adhesion molecule 1 (ICAM-1) cell surface expression and neutrophil adhesion on endothelial cells in culture. The contribution of two Mitogen Activated Protein Kinase (MAPK) pathways was determined using small molecule inhibitors PD184352 (a MKK1/MKK2 inhibitor) and BIRB0796 (a p38 MAPK inhibitor). We found increased IL-17D and IL-17F plasma levels when comparing SARS-CoV-2-positive vs negative hospitalized participants. Moreover, increased plasma levels of IL-17D, IL-17E and IL-17F were noted when comparing severe versus mild COVID-19. IL-17F, but not IL-17A, was significantly elevated in people with COVID-19 compared to healthy controls and with more severe disease. In vitro work on endothelial cells treated with IL-17F for 24h showed an increase cell surface expression of ICAM-1 accompanied by neutrophil adhesion. The introduction of two MAPK inhibitors significantly reduced the binding of neutrophils while also reducing ICAM-1 expression at the surface level of endothelial cells, but not its intracellular expression. Overall, these results have identified an association between two cytokines of the IL-17 family (IL-17D and IL-17F) with COVID-19 and disease severity. Considering that IL-17F stimulation promotes neutrophil adhesion to the endothelium in a MAPK-dependent manner, it is attractive to speculate that this pathway may contribute to pathogenic immunothrombosis in concert with other molecular effectors.
A Complexity-Based Theory of Compositionality
Convergence of Manifold Filter-Combine Networks
David R. Johnson
Joyce Chew
Edward De Brouwer
Deanna Needell
Michael Perlmutter
In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). The filter-combine fra… (voir plus)mework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as the manifold analog of various popular GNNs. We then propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating the manifold by a sparse graph. We prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity.
Assessment of the Climate Trace global powerplant CO2 emissions
Kevin R. Gurney
Bilal Aslam
Pawlok Dass
Lech Gawuc
Jarrett J Barber
Anna Kato
Accurate estimation of planetary greenhouse gas (GHG) emissions at the scale of individual emitting activities is a critical need for both s… (voir plus)cientific and policy applications. Powerplants represent the single largest and most concentrated form of global GHG emissions. Climate Trace, co-founded and promoted by former U.S. Vice President Al Gore, is a new effort using, in part, artificial intelligence (AI) approaches to estimate asset-scale GHG emissions. Climate Trace recently released a database of global powerplant CO2 emissions at the facility-scale that uses both AI and non-AI estimation approaches. However, no independent peer-reviewed assessment has been made of this important global emissions database. Here, we compare the Climate Trace powerplant CO2 emissions to an atmospherically calibrated, multi-constraint estimate of powerplant CO2 emissions in the United States. The 3.7% (65) of compared facilities that used an AI-based approach show a mean relative difference (MRD) of −1.1% (SD: 46.4%) in the year 2019. The 96.3% (1726) of the facilities that used a non-AI-based approach show a MRD of −50.0% (SD: 117.7%). Of the non-AI estimated facilities, 151 (8.7%) facilities agree to within ±20%. The large differences between Climate Trace and Vulcan-power emission estimates for these facilities is primarily caused by Climate Trace’ use of a national-mean power plant capacity factor (CF) which is a poor representation of the reported power plant CFs of individual US facilities and leads to very large errors at those same 1726 facilities.
A Simulation System Towards Solving Societal-Scale Manipulation
Austin Welch
Gayatri K
Dan Zhao
Hao Yu
Ethan Kosak-Hine
Tom Gibbs
Busra Tugce Gurbuz
The rise of AI-driven manipulation poses significant risks to societal trust and democratic processes. Yet, studying these effects in real-w… (voir plus)orld settings at scale is ethically and logistically impractical, highlighting a need for simulation tools that can model these dynamics in controlled settings to enable experimentation with possible defenses. We present a simulation environment designed to address this. We elaborate upon the Concordia framework that simulates offline, `real life' activity by adding online interactions to the simulation through social media with the integration of a Mastodon server. We improve simulation efficiency and information flow, and add a set of measurement tools, particularly longitudinal surveys. We demonstrate the simulator with a tailored example in which we track agents' political positions and show how partisan manipulation of agents can affect election results.