Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto
Pierluca D'Oro
Amy Zhang
Yuandong Tian
Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored t… (voir plus)o specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.
Towards Improving Exploration through Sibling Augmented GFlowNets
Kanika Madan
Alex Lamb
Exploration is a key factor for the success of an active learning agent, especially when dealing with sparse extrinsic terminal rewards and … (voir plus)long trajectories. We introduce Sibling Augmented Generative Flow Networks (SA-GFN), a novel framework designed to enhance exploration and training efficiency of Generative Flow Networks (GFlowNets). SA-GFN uses a decoupled dual network architecture, comprising of a main Behavior Network and an exploratory Sibling Network, to enable a diverse exploration of the underlying distribution using intrinsic rewards. Inspired by the ideas on exploration from reinforcement learning, SA-GFN provides a general-purpose exploration and learning paradigm that integrates with multiple GFlowNet training objectives and is especially helpful for exploration over a wide range of sparse or low reward distributions and task structures. An extensive set of experiments across a diverse range of tasks, reward structures and trajectory lengths, along with a thorough set of ablations, demonstrate the superior performance of SA-GFN in terms of exploration efficacy and convergence speed as compared to the existing methods. In addition, SA-GFN's versatility and compatibility with different GFlowNet training objectives and intrinsic reward methods underscores its broad applicability in various problem domains.
Towards Interpreting Visual Information Processing in Vision-Language Models
Clement Neo
Luke Ong
Philip Torr
Mor Geva
Fazl Barez
Towards whole-genome inference of polygenic scores with fast and memory-efficient algorithms
Shadi Zabad
Chirayu Anant Haryan
Simon Gravel
Sanchit Misra
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
Zhaohan Daniel Guo
Bernardo Avila Pires
Yunhao Tang
Clare Lyle
Mark Rowland
Nicolas Heess
Diana Borsa
Arthur Guez
Will Dabney
VCR: Pixel-Level Complex Reasoning by Restoring Occluded Text
Tianyu Zhang
Suyuchen Wang
Lu Liu
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images through complex reasoning. This task stems from the observation that text embedded in images intrinsically differs from common visual elements and text due to the need to align the modalities of vision, text, and text embedded in images. While many works incorporate text into images for visual question answering, they mostly rely on OCR or masked language modeling, reducing the task to text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny, exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct VCR-WIKI for VCR using Wikipedia images with captions, including 2.11M English and 346K Chinese training entities, plus 5K validation and 5K test entities in both languages, each in easy and hard configurations. We also make a hidden test set, VCR-HIDDEN, to avoid potential overfitting on VCR-WIKI. Our results reveal that current vision-language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-WIKI and the data construction code to facilitate future research.
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images through complex reasoning. This task stems from the observation that text embedded in images intrinsically differs from common visual elements and text due to the need to align the modalities of vision, text, and text embedded in images. While many works incorporate text into images for visual question answering, they mostly rely on OCR or masked language modeling, reducing the task to text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny, exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct VCR-WIKI for VCR using Wikipedia images with captions, including 2.11M English and 346K Chinese training entities, plus 5K validation and 5K test entities in both languages, each in easy and hard configurations. We also make a hidden test set, VCR-HIDDEN, to avoid potential overfitting on VCR-WIKI. Our results reveal that current vision-language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-WIKI and the data construction code to facilitate future research.
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang
Suyuchen Wang
Lu Liu
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun
Ibtihel Amara
Cristina Nader Vasconcelos
Deepak Ramachandran
Candice Schumann
Junfeng He
Katherine A Heller
Mohammad Havaei
Deep Generative Models are frequently used to learn continuous representations of complex data distributions using a finite number of sample… (voir plus)s. For any generative model, including pre-trained foundation models with GAN, Transformer or Diffusion architectures, generation performance can vary significantly based on which part of the learned data manifold is sampled. In this paper we study the post-training local geometry of the learned manifold and its relationship to generation outcomes for models ranging from toy settings to the latent decoder of the near state-of-the-art Stable Diffusion 1.4 Text-to-Image model. Building on the theory of continuous piecewise-linear (CPWL) generators, we characterize the local geometry in terms of three geometric descriptors - scaling (
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun
Ibtihel Amara
Candice Schumann
Cristina Nader Vasconcelos
Deepak Ramachandran
Junfeng He
Mohammad Havaei
Katherine A Heller
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Tian Jin
Ahmed Imtiaz Humayun
Utku Evci
Suvinay Subramanian
Amir Yazdanbakhsh
Dan Alistarh
Pruning eliminates unnecessary parameters in neural networks; it offers a promising solution to the growing computational demands of large l… (voir plus)anguage models (LLMs). While many focus on post-training pruning, sparse pre-training--which combines pruning and pre-training into a single phase--provides a simpler alternative. In this work, we present the first systematic exploration of optimal sparse pre-training configurations for LLMs through an examination of 80 unique pruning schedules across different sparsity levels and training durations. We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss. These findings provide valuable insights for efficient and effective sparse pre-training of LLMs. Furthermore, we propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training. Through empirical and theoretical validation, we demonstrate that this modified scaling law accurately models evaluation loss for both sparsely and densely pre-trained LLMs, unifying scaling laws across pre-training paradigms. Our findings indicate that while sparse pre-training achieves the same final model quality as dense pre-training for equivalent compute budgets, it provides substantial benefits through reduced model size, enabling significant potential computational savings during inference.
Artificial Neural Networks for Magnetoencephalography: A review of an emerging field
Arthur Dehgan
Hamza Abdelhedi
Vanessa Hadid
Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive proces… (voir plus)ses with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG.