A Robot Walks into a Bar: Can Language Models Serve as Creativity SupportTools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Piotr Mirowski
Juliette Love
Shakir Mohamed
Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models
Lucas Berry
Axel Brando
Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spa… (see more)ces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes.
Temporal trends in disparities in COVID-19 seropositivity among Canadian blood donors
Yuan Yu
Matthew J Knight
Diana Gibson
Sheila F O’Brien
W Alton Russell
Abstract Background In Canada’s largest COVID-19 serological study, SARS-CoV-2 antibodies in blood donors have been monitored since 2020. … (see more)No study has analysed changes in the association between anti-N seropositivity (a marker of recent infection) and geographic and sociodemographic characteristics over the pandemic. Methods Using Bayesian multi-level models with spatial effects at the census division level, we analysed changes in correlates of SARS-CoV-2 anti-N seropositivity across three periods in which different variants predominated (pre-Delta, Delta and Omicron). We analysed disparities by geographic area, individual traits (age, sex, race) and neighbourhood factors (urbanicity, material deprivation and social deprivation). Data were from 420 319 blood donations across four regions (Ontario, British Columbia [BC], the Prairies and the Atlantic region) from December 2020 to November 2022. Results Seropositivity was higher for racialized minorities, males and individuals in more materially deprived neighbourhoods in the pre-Delta and Delta waves. These subgroup differences dissipated in the Omicron wave as large swaths of the population became infected. Across all waves, seropositivity was higher in younger individuals and those with lower neighbourhood social deprivation. Rural residents had high seropositivity in the Prairies, but not other regions. Compared to generalized linear models, multi-level models with spatial effects had better fit and lower error when predicting SARS-CoV-2 anti-N seropositivity by geographic region. Conclusions Correlates of recent COVID-19 infection have evolved over the pandemic. Many disparities lessened during the Omicron wave, but public health intervention may be warranted to address persistently higher burden among young people and those with less social deprivation.
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
Melissa Hall
Samuel J. Bell
Candace Ross
Adina Williams
Michal Drozdzal
Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of … (see more)thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated metrics to facilitate scalable and cost-effective performance profiling. However, commonly-used metrics often fail to account for the full diversity of human preference; often even in-depth human evaluations face challenges with subjectivity, especially as interpretations of evaluation criteria vary across regions and cultures. In this work, we conduct a large, cross-cultural study to study how much annotators in Africa, Europe, and Southeast Asia vary in their perception of geographic representation, visual appeal, and consistency in real and generated images from state-of-the art public APIs. We collect over 65,000 image annotations and 20 survey responses. We contrast human annotations with common automated metrics, finding that human preferences vary notably across geographic location and that current metrics do not fully account for this diversity. For example, annotators in different locations often disagree on whether exaggerated, stereotypical depictions of a region are considered geographically representative. In addition, the utility of automatic evaluations is dependent on assumptions about their set-up, such as the alignment of feature extractors with human perception of object similarity or the definition of"appeal"captured in reference datasets used to ground evaluations. We recommend steps for improved automatic and human evaluations.
Visibility into AI Agents
Alan Chan
Carson Ezell
Max Kaufmann
Kevin Wei
Lewis Hammond
Herbie Bradley
Emma Bluemke
Nitarshan Rajkumar
Noam Kolt
Lennart Heim
Markus Anderljung
Efficient Leverage Score Sampling for Tensor Train Decomposition
Vivek Bharadwaj
Beheshteh T. Rakhshan
Osman Asif Malik
Milnor-Myerson Games and The Principles of Artificial Principal-Agent Problems
Manfred Diaz
Joel Z Leibo
In this paper, we introduce Milnor-Myerson games, a multiplayer interaction structure at the core of machine learning (ML), to shed light on… (see more) the fundamental principles and implications the artificial principal-agent problem has had in landmark ML results like AlphaGo and large language models (LLMs).
PETRA: Parallel End-to-end Training with Reversible Architectures
Stephane Rivaud
Louis Fournier
Thomas Pumir
Michael Eickenberg
Edouard Oyallon
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep l… (see more)earning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.
Random Policy Evaluation Uncovers Policies of Generative Flow Networks
Haoran He
Qingpeng Cai 0001
Ling Pan
The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sampl… (see more)e objects with probability proportional to an unnormalized reward function. GFlowNets share a strong connection with reinforcement learning (RL) that typically aims to maximize reward. A number of recent works explored connections between GFlowNets and maximum entropy (MaxEnt) RL, which incorporates entropy regularization into the standard RL objective. However, the relationship between GFlowNets and standard RL remains largely unexplored, despite the inherent similarities in their sequential decision-making nature. While GFlowNets can discover diverse solutions through specialized flow-matching objectives, connecting them to standard RL can simplify their implementation through well-established RL principles and also improve RL's capabilities in diverse solution discovery (a critical requirement in many real-world applications), and bridging this gap can further unlock the potential of both fields. In this paper, we bridge this gap by revealing a fundamental connection between GFlowNets and one of the most basic components of RL -- policy evaluation. Surprisingly, we find that the value function obtained from evaluating a uniform policy is closely associated with the flow functions in GFlowNets. Building upon these insights, we introduce a rectified random policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets based on simply evaluating a fixed random policy, offering a new perspective. Empirical results across extensive benchmarks demonstrate that RPE achieves competitive results compared to previous approaches, shedding light on the previously overlooked connection between (non-MaxEnt) RL and GFlowNets.
Random Policy Evaluation Uncovers Policies of Generative Flow Networks
Haoran He
Qingpeng Cai 0001
Ling Pan
Sequential predictive learning is a unifying theory for hippocampal representation and replay
Daniel Levenstein
Aleksei Efremov
Roy Henha Eyono
Adrien Peyrache
The mammalian hippocampus contains a cognitive map that represents an animal’s position in the environment 1 and generates offline “repl… (see more)ay” 2,3 for the purposes of recall 4, planning 5,6, and forming long term memories 7. Recently, it’s been found that artificial neural networks trained to predict sensory inputs develop spatially tuned cells 8, aligning with predictive theories of hippocampal function 9–11. However, whether predictive learning can also account for the ability to produce offline replay is unknown. Here, we find that spatially-tuned cells, which robustly emerge from all forms of predictive learning, do not guarantee the presence of a cognitive map with the ability to generate replay. Offline simulations only emerged in networks that used recurrent connections and head-direction information to predict multi-step observation sequences, which promoted the formation of a continuous attractor reflecting the geometry of the environment. These offline trajectories were able to show wake-like statistics, autonomously replay recently experienced locations, and could be directed by a virtual head direction signal. Further, we found that networks trained to make cyclical predictions of future observation sequences were able to rapidly learn a cognitive map and produced sweeping representations of future positions reminiscent of hippocampal theta sweeps 12. These results demonstrate how hippocampal-like representation and replay can emerge in neural networks engaged in predictive learning, and suggest that hippocampal theta sequences reflect a circuit that implements a data-efficient algorithm for sequential predictive learning. Together, this framework provides a unifying theory for hippocampal functions and hippocampal-inspired approaches to artificial intelligence.
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli
Louis Fournier
Pierre ERBACHER
Louis Serrano
Edouard Oyallon
Training Large Language Models (LLMs) relies heavily on distributed implementations, employing multiple GPUs to compute stochastic gradients… (see more) on model replicas in parallel. However, synchronizing gradients in data parallel settings induces a communication overhead increasing with the number of distributed workers, which can impede the efficiency gains of parallelization. To address this challenge, optimization algorithms reducing inter-worker communication have emerged, such as local optimization methods used in Federated Learning. While effective in minimizing communication overhead, these methods incur significant memory costs, hindering scalability: in addition to extra momentum variables, if communications are only allowed between multiple local optimization steps, then the optimizer's states cannot be sharded among workers. In response, we propose